The challenge we all face in data today is ensuring that we
use data as it is intended. Often data consumers are guilty of using data with
little or no knowledge of its source or lineage. The more complex data gets;
the bigger this problem of becomes.
It is imperative for data organizations get a grip on what
is meant by metadata and how to best include it within the processes which lead
to data-driven decisions. Metadata as defined by many as data about data, but
it is much more than that. It provides a description of the data including it
attributes, relationships, business rules, along with data about how to
retrieve, use and manage it.
Metadata makes it easier for everyone to talk
about data in a singular and consistent manner.
Therefore it is critical for long-term data success to get
metadata collected and managed. This is not simply a one-time exercise but rather
a continuous process. The task may seem daunting but in reality is that you
must eventually bite the bullet and get started or you are doomed to failure or
at least serious confusion.
To begin the organization should look to the most business
critical systems and clearly document their data. I would suggest that all
organizations find a metadata tool to support this need. This can be a
difficult as many of these tools are costly and often organizations have trouble
justifying their cost. As an alternative you can manage it through a custom
solution or Excel (if necessary) but find some way which data can be shared and
utilized as desired.
Metadata is a complex subject and it is made more complex by
Big Data. In this area we often say that we only enforce a schema at the time
we require the data. Although this is one of the powers of Big Data it is also
one of its troubling approaches. In many ways this goes against the basic
foundation for data governance but we cannot simply avoid it we must deal with
it. I believe that any data which we collect can be documented. We can capture
metadata based on this data. It should document the possible uses for the data
and the alternative structures which are available. Although the data may be “schemaless”
it still has structure and this should be captured and shared.
Ultimately we must realize that we cannot simply hope that
our data is used as expected; we need to be sure it is.