Thursday, January 7, 2016

Compliance in the Age of Unlimited Data

During the recent CLS Conference in Toronto on Financial Industry compliance it struck me that a considerable amount of discussion was focused on the industry’s recognition of the scope and complexity that Compliance teams face in today’s world. We live in a world with vast quantities of data but we struggle to organize it in meaningful ways. There was discussion around the changes to regulations and strategies that various stakeholders are proposing to meet new and expanding reporting requirements and all of them require data. The message that came across clearly is that everyone from the regulator to the investor are part of the compliance lifecycle and can contribute to its management and enforcement, from my perspective I wondered about how well most of these organizations have harnessed their data to support compliance. There was even a discussion on how to turn compliance into a profit centre based on the savings for being simply compliant. Organizations can realize a cost savings by reduced legal proceedings and any potential fines.  From my perspective as a data professional I believe that the foundation of compliance, enforcement and identification is based in data and most of the dialogue mentioned source information but there was little discussion on the difficulty in collecting the data so that it is meaningful. From the compliance officer’s perspective, they need the data and IT just needs to provide it on a timely basis. As reflected by the speakers at the event this is often the most demanding and underestimated part of compliance. From an IT perspective we need to better serve the users by providing data in more standardized and consumable format for both the organization and the regulators.

There is a significant demand today for information, and compliance is one area in the financial industry where a truly comprehensive view of the data is needed more than most. The compliance group in all financial institutions need to generally report on all of the information across the entire institution. This requires that data is collected from numerous systems and organized in a way that is useful and meets the compliance requirements. Most financial companies have so many systems and so much data this task is daunting and challenging. Each financial institution have such a volume of data needed to support compliance they are looking at how to better store information such that it serves compliance needs but also provides additional value for analytics and prediction.  This is not simply a storage problem but also a problem of integration. This is where Big Data is lending a helping hand. We see many banks creating data repositories to service compliance requirements but with vision for a more robust platform for analytics in the future. Data is being stored and integrated to support the foundation of information for this need today of compliance and the future for improved analytics. These data repositories become valuable resources as for many this is the first time that data from across the various financial portfolios have been integrated and made available for use by analysts along with compliance teams.

The other challenge we observed was how these same organizations need to become more proactive to address problems before they become compliance issues. Most reporting today including compliance, present data at a point in time, this approach is great for reactionary actions but do little to prevent future incidents. Our data solutions need to provide more actionable insights and identify problematic trends in advance. One such example is finding trends in questionable trading.  Today most brokerages have systems which identify whether a single trade is potentially a violation of trading rules or a trade model such as layering or spoofing. Most receive many of these alerts per day and finding trends is difficult if not impossible. Based on a single trade alert, a compliance officer may identify it as a violation and take some action, but often with so much data, the noise of false alerts make it difficult to identify the true violation. However when grouped together we can see trends that at the individual level were not detectable but over a longer time frame are repeated and begin to form an identifiable and actionable trend. This approach to analytics can enable better and more proactive compliance. In this case it provided new analytics which complemented the compliance requirements.

The conference illustrated the complexity which all compliance teams face but also shared how organizations can better manage and implement it. At the core, financial institutions are finding better ways to address compliance and data is the foundation.

Thursday, October 22, 2015

Time to Revitalize Your Data Warehouse

The convergence of information, technology and analytics in the era of Big Data is revolutionizing how we need to approach our data solutions. We now have capabilities to collect all data into a repository in a scalable way which can serve people with information to make more informed decisions. The challenge is that many companies have been building data warehouses and reporting systems for years. Recent studies have shown that organizations, both small and large are investing more in analytics and maintaining or reducing their traditional data warehouse spending. To me this seems an odd combination. How could one reduce data warehouse spending and increase analytics? My conclusion is that companies are maintaining their current data warehouses and are building new data solutions which are being built to complement their current warehouses. So if we consider all things maybe the true number is that data spending is increasing overall just in new ways.

This split focus between analytics and data warehousing is at odds with an enterprise approach to data. Organizations must consider how they can evolve their existing data warehouses and enable better analytics not through a revolution but rather an evolution. We must look at how Big Data is impacting our business and data processes and adjust how we architect our data warehouses.
According to Forbes, the average spending on data projects in 2015 was $7.4 million. Enterprise organizations spent $13.8M while SMB’s spend an average of $1.6M. With this level of funding the value must be realized. So how can we evolve our data warehouses? The key is finding the space where Big Data makes the most sense. 

Today many organization are at the some point towards the development of data hubs and landing areas using Hadoop technology. We tend to concentration on the landing and staging areas where we can make the most impact with the least amount of disruption. By replacing these components in the data lifecycle, we can build a new region where data is collected and prepared to meet with analytic needs, replacing these areas which were based in a relational database at a significant cost. The new expanded Landing and Staging areas are now built with Big Data and analytics in mind in addition to the traditional needs for business reporting.  Data Architects like myself are looking at creating an environment which collects all of the data and then prepares it into a conformed arrangement where data can be served up in a structured manner to supply data to the data warehouse while providing an environment where unstructured and structured data can supply raw information to the data scientists and data analysts for their analysis. This approach is one which is quite intuitive but also one which enables a better data architecture as we are separating the various parts of our solution. By separating the landing and staging we can use the technology which best suits it today while being mindful of the future. The same would apply to the high performance analytic platforms. So we may choose an RDBMS like Oracle or Netezza which today is the most appropriate platform for traditional BI but tomorrow could bring us a new technology which will be too appealing to ignore. So by separating the functionality and technology we can evolve our data warehouse in a more agile way. 

The use case of replacing your landing and staging with Hadoop is one which serves many purposes including reducing costs and extending capacity but primarily it creates a new environment to support modern advanced analytics. This data evolution is needed to ensure that your data warehouse changes with times or gets left behind in the highly competitive business world. Now is the time to consider to renew your data warehouse architecture and see how Big Data can help to elevate your business reporting and analytics

Tuesday, May 26, 2015

Where is Data Taking Us?

I recently spoke at the Data Summit in NYC where the focus was to “Unleash the Power of Your Data”. This was a great message for the event, as we live in a time where people are extolling the virtues of data but numerous people struggle to grasp how data can change a business, how can they harness it’s power. Today’s organizations have a good handle on how well-defined data can be used to present information on how they are doing. We produce reports on sales, on costs, we find can easily discover what has happened, but the real business challenge is why and how to make the future better. Predicting the future is the real silver bullet that companies need, but one that will continue to be elusive.  The conference weighed this question in a number of different ways but all had the same undertone that data is a resource which must drive value.

In my presentation, “Analytics in the Time of Big Data”, I presented how BI and Analytics has changed over the years. The evolution from reporting for the masses which was reactive and provided summarization on what was happening to a future where results and effects can be accurately predicted and create an environment where the right decision or action can be enacted. Reporting is changing and growing and we need to understand the opportunities they provide. 

The basic type of reporting or analytics is known as Descriptive Analytics and is the most common method of reporting we encounter, but in many ways it is the most limited form of analytics. In Descriptive Analytics we provide a description of what has happened. It provides the pulse on the business and is illustrated through standard reports and dashboards. The next is Predictive Analytics which uses statistics, modeling, data mining and machine learning to analyze current data to predict what may occur. This provided a way to figure out what may happen in the future based on experiences of the many. We see this type of reporting everyday we are on the Web. We see recommendations for content and products which are based on what we do and what we read and watch. The final and most productive analytics is Prescriptive Analytics which takes predictive analytics and adds actionable data from both internal and external sources and the ability to capture feedback to then improve future analytics. In the Oil and Gas industry they use prescriptive analytics to determine the best place to drill, the best methods to use and the production that the well will produce. This progression is natural but there is significant resistance and lack of knowledge which is restricting businesses chances of evolving and this was widely discussed during the event. The change is needed if businesses are going to compete but companies need to start as the journey is not a short one.

The challenge is to provide access to the data in a form which allows for the integration of all your data in one place in a governed and controlled area.  We need to build data repositories which can support these needs, but in reality is it realistic to think all of your data will be in Hadoop or in an RDBMS. We need to find a solution which provides a federated view of all your data in a meaningful way for the business, this is known as data virtualization. The concept of data virtualization is one where the structure of the data may not be well understood at first but during the use and application we put meaning to it and create useful business definitions and relationships for use by the entire business. Consider this a Schema-on-the-fly approach which can provide flexibility while allowing the core data to be available, it also provides users with the ability to define the data that they need when they need it. It includes the merging of from both internal and external sources in a way that simplifies data access for users by hiding the complexity in the data through data virtualization. Now we have the ability to solve the integration and availability of the data but we will still need to change how we do analytics and look to our mathematicians to help us expand what we do and how we provide our services.

The evolution of reporting is underway and we need to look at how we can provide our business analysts, business users and consumers with the information they need when they need to empower them to make better and more accurate decisions.