Tuesday, February 16, 2016
The challenge we all face in data today is ensuring that we use data as it is intended. Often data consumers are guilty of using data with little or no knowledge of its source or lineage. The more complex data gets; the bigger this problem of becomes.
It is imperative for data organizations get a grip on what is meant by metadata and how to best include it within the processes which lead to data-driven decisions. Metadata as defined by many as data about data, but it is much more than that. It provides a description of the data including it attributes, relationships, business rules, along with data about how to retrieve, use and manage it.
Metadata makes it easier for everyone to talk about data in a singular and consistent manner.
Therefore it is critical for long-term data success to get metadata collected and managed. This is not simply a one-time exercise but rather a continuous process. The task may seem daunting but in reality is that you must eventually bite the bullet and get started or you are doomed to failure or at least serious confusion.
To begin the organization should look to the most business critical systems and clearly document their data. I would suggest that all organizations find a metadata tool to support this need. This can be a difficult as many of these tools are costly and often organizations have trouble justifying their cost. As an alternative you can manage it through a custom solution or Excel (if necessary) but find some way which data can be shared and utilized as desired.
Metadata is a complex subject and it is made more complex by Big Data. In this area we often say that we only enforce a schema at the time we require the data. Although this is one of the powers of Big Data it is also one of its troubling approaches. In many ways this goes against the basic foundation for data governance but we cannot simply avoid it we must deal with it. I believe that any data which we collect can be documented. We can capture metadata based on this data. It should document the possible uses for the data and the alternative structures which are available. Although the data may be “schemaless” it still has structure and this should be captured and shared.
Ultimately we must realize that we cannot simply hope that our data is used as expected; we need to be sure it is.