Tuesday, April 19, 2016

Oracle Users In Action (A Collaborate16 Conference Review)

The recent COLLABORATE16 conference held by the 3 major Oracle user communities (IOUG, Quest and OAUG) brought together individuals focus on Oracle technology, middleware and  applications such as JDE and PeopleSoft. The event was attending by about 6,000 professionals from around the world.

The 4 day event had many themes and introduced ideas but my thoughts was focused on the messages in the following areas:
  • The upcoming release of Oracle Database 13c. An updated version with expanded features. Still under NDA so we have little to discuss at the moment.
  • The Cloud
  • Business Intelligence (Discussion around the concept of bimodal BI)
  • Security
  • Big Data and Internet of Things
We all attend many sessions during a 4 day conference and people either make a great or insightful statement which we later quote or they use a quote to make a point. In one case the use of a quote made an impression on me. It was a quote by Edwards Deming in 1942 which continues to resonate today:

Scientific data are not taken for museum purposes; they are taken as a basis for doing something. If nothing is to be done with the data, then there is no use in collecting any. The ultimate purpose of taking data is to provide a basis for action or a recommendation for action. The step intermediate between the collection of data and the action is prediction.
-          W. Edwards Deming
o   On a Classification of the Problems of Statistical Inference, June 1942, Journal of the American Statistical Association.

This idea which Deming states is that we should not collect data without purpose, without a use case. He was working for the US Census Bureau and said that unless data is useful why collect it. We need to consider the same when we work with data today. We should not try and boil the ocean; rather we should look to use data with focus.

Core to the conference was discussion around the database. Seminars included backup, security, performance tuning and databases on appliances like Exadata. The conference also held Oracle 13c beta session about new functions and features and to prepare people for the upcoming release. It is expected that the next release of the Oracle database (13c) will be around the Fall or sooner, depending on how beta testing progresses but one can expect it will be out in time for Oracle OpenWorld at the latest. There was onsite beta testing taking place at the event for customers who are part of the beta program.  Enterprise Manager 13c which includes support for both on-premise and in-cloud databases is improved and extended.

A big topic of conversation was Cloud. This permeated across all technology and applications. Oracle is focused on becoming the biggest Cloud provider of databases, applications and other components provided through their SaaS and PaaS strategy. 

All products which Oracle provides are now available via the Oracle Cloud. This strategy allows for monthly feature and fix releases which are said to not impact past functionality…… current prices for the Oracle Cloud are quite aggressive in comparison to their competitors

Consider Oracle’s new Data Visualization product. This product to compete against products like Tableau are currently inly available as a desktop single user version or as a Cloud service. (https://cloud.oracle.com/en_US/data_visualization?resolvetemplatefordevice=true&tabID=1445271963053). This is indicative of the Cloud strategy where software is developed so that the minimum viable product is made available and then development of features continue.

This leads into the new concept of Bi-Modal Business Intelligence. The concept of bi-modal is the merging of two approaches to information management. The first being the traditional approach where formal processes and policies are put in place to support well defined and mature requirements. The second mode is one which is more Agile and addresses issues using an approach of discovery. The following slide from Gartner provides some guidance on the differences between the two modes:
This approach was originally created to support a general change to how development works and to introduce Agile into the workplace, it has been further extended for BI. The following illustrates these modes as they relate to Business Intelligence:

One seminar talked about how the Oracle tool set addresses these 2 modes of operation. Oracle and a couple of the speakers address the question by suggesting that the Oracle BI suite satisfies both sides of the equation. The idea is that OBIEE is the Mode 1 approach with highly governed and structured approach to reporting. Although the product does include the ability for some individuals to create Mode 2 reports, the requirement for curated data defined by the semantic layer does go against some of the concepts within the definition of Mode 2.  Oracle is now offering Oracle’s Data Visualization (DV). This is the product Oracle has released to compete with Tableau. The advantage is that you can better govern the data access to data sources which is a step ahead of Tableau which thrives on using non-curated data. At this point this product should appeal to organizations who wish to reduce the complexity of tools which they support. DV is currently only available as a Cloud service. There was some discussion of making it available as a stand-alone version, but not as a server-based version.

The conversation around Security continues to be a big concern for most database professionals. The speakers discussed methods for securing data which includes masking and other advanced methods of encryption. When using Oracle there are enough features when used in the basic and advanced security features that no database today should be left unprotected. Generally the issues are caused by organizations not using what is available today. In the Big Data space, security continues to be an issue. The Data Lake is presenting new challenges to ensure that data in Hadoop is protected. Most organizations today merely place Data Lakes in restricted areas/servers. This is really only the first step, security is developing in the space and should be a major concern. In a recent survey, Oracle customers stated that security and performance are the two major concerns with the adoption of Big Data in their organization.

During the opening session the results of a survey of Oracle users was presented on the topics of what people are doing and planning and what are the biggest challenges they are experiencing. The key findings were around the adoption of Cloud which is advancing while Big Data is moving with more caution.  A good summary of this discussion may be found at; http://www.dbta.com/Editorial/News-Flashes/Ground-Breaking-Research-on-New-IT-Trends-Adoption-is-Presented-at-COLLABORATE-16-110361.aspx

The last area which I will focus on is that of Big Data and IoT. The conversation of the topic continues at the periphery of this group. While almost all attendees have Data Warehouses, few have active Big Data initiatives and fewer in IoT. One interesting presentation was about using R for Data Quality. I found this to be a great approach to using the new Big Data tools to ensure quality within the repository. The use of R provides many of the statistical methods which may be used to profile data. I thought this was a good approach to how one may implement a DQ profiling approach. The presentation may be seen at: http://www.slideshare.net/michellekolbe/data-profiling-with-r

I find it interesting that this user community continues to be very much focused on databases and not in the evolution of data. They are doing a good job in sharing information about the database, middleware and applications but they also are more resistant to newer technologies like the transition to Big Data. This may be the nature of this group, but considering that the people here are the technologists who manage most of the business critical applications the biggest trend for them is moving to the Cloud, and to a lesser extent the data which is being generated by these applications.

Tuesday, February 16, 2016

Metadata is More Important than Ever

The challenge we all face in data today is ensuring that we use data as it is intended. Often data consumers are guilty of using data with little or no knowledge of its source or lineage. The more complex data gets; the bigger this problem of becomes. 

It is imperative for data organizations get a grip on what is meant by metadata and how to best include it within the processes which lead to data-driven decisions. Metadata as defined by many as data about data, but it is much more than that. It provides a description of the data including it attributes, relationships, business rules, along with data about how to retrieve, use and manage it. 

Metadata makes it easier for everyone to talk about data in a singular and consistent manner.
Therefore it is critical for long-term data success to get metadata collected and managed. This is not simply a one-time exercise but rather a continuous process. The task may seem daunting but in reality is that you must eventually bite the bullet and get started or you are doomed to failure or at least serious confusion. 

To begin the organization should look to the most business critical systems and clearly document their data. I would suggest that all organizations find a metadata tool to support this need. This can be a difficult as many of these tools are costly and often organizations have trouble justifying their cost. As an alternative you can manage it through a custom solution or Excel (if necessary) but find some way which data can be shared and utilized as desired. 

Metadata is a complex subject and it is made more complex by Big Data. In this area we often say that we only enforce a schema at the time we require the data. Although this is one of the powers of Big Data it is also one of its troubling approaches. In many ways this goes against the basic foundation for data governance but we cannot simply avoid it we must deal with it. I believe that any data which we collect can be documented. We can capture metadata based on this data. It should document the possible uses for the data and the alternative structures which are available. Although the data may be “schemaless” it still has structure and this should be captured and shared.

Ultimately we must realize that we cannot simply hope that our data is used as expected; we need to be sure it is.  

Thursday, January 7, 2016

Compliance in the Age of Unlimited Data

During the recent CLS Conference in Toronto on Financial Industry compliance it struck me that a considerable amount of discussion was focused on the industry’s recognition of the scope and complexity that Compliance teams face in today’s world. We live in a world with vast quantities of data but we struggle to organize it in meaningful ways. There was discussion around the changes to regulations and strategies that various stakeholders are proposing to meet new and expanding reporting requirements and all of them require data. The message that came across clearly is that everyone from the regulator to the investor are part of the compliance lifecycle and can contribute to its management and enforcement, from my perspective I wondered about how well most of these organizations have harnessed their data to support compliance. There was even a discussion on how to turn compliance into a profit centre based on the savings for being simply compliant. Organizations can realize a cost savings by reduced legal proceedings and any potential fines.  From my perspective as a data professional I believe that the foundation of compliance, enforcement and identification is based in data and most of the dialogue mentioned source information but there was little discussion on the difficulty in collecting the data so that it is meaningful. From the compliance officer’s perspective, they need the data and IT just needs to provide it on a timely basis. As reflected by the speakers at the event this is often the most demanding and underestimated part of compliance. From an IT perspective we need to better serve the users by providing data in more standardized and consumable format for both the organization and the regulators.

There is a significant demand today for information, and compliance is one area in the financial industry where a truly comprehensive view of the data is needed more than most. The compliance group in all financial institutions need to generally report on all of the information across the entire institution. This requires that data is collected from numerous systems and organized in a way that is useful and meets the compliance requirements. Most financial companies have so many systems and so much data this task is daunting and challenging. Each financial institution have such a volume of data needed to support compliance they are looking at how to better store information such that it serves compliance needs but also provides additional value for analytics and prediction.  This is not simply a storage problem but also a problem of integration. This is where Big Data is lending a helping hand. We see many banks creating data repositories to service compliance requirements but with vision for a more robust platform for analytics in the future. Data is being stored and integrated to support the foundation of information for this need today of compliance and the future for improved analytics. These data repositories become valuable resources as for many this is the first time that data from across the various financial portfolios have been integrated and made available for use by analysts along with compliance teams.

The other challenge we observed was how these same organizations need to become more proactive to address problems before they become compliance issues. Most reporting today including compliance, present data at a point in time, this approach is great for reactionary actions but do little to prevent future incidents. Our data solutions need to provide more actionable insights and identify problematic trends in advance. One such example is finding trends in questionable trading.  Today most brokerages have systems which identify whether a single trade is potentially a violation of trading rules or a trade model such as layering or spoofing. Most receive many of these alerts per day and finding trends is difficult if not impossible. Based on a single trade alert, a compliance officer may identify it as a violation and take some action, but often with so much data, the noise of false alerts make it difficult to identify the true violation. However when grouped together we can see trends that at the individual level were not detectable but over a longer time frame are repeated and begin to form an identifiable and actionable trend. This approach to analytics can enable better and more proactive compliance. In this case it provided new analytics which complemented the compliance requirements.

The conference illustrated the complexity which all compliance teams face but also shared how organizations can better manage and implement it. At the core, financial institutions are finding better ways to address compliance and data is the foundation.