Monday, May 27, 2013

Designing a Future-Proof Data Solution



The challenges we face today when designing solutions is how do we avoid the pitfalls of constant design changes? How can we reduce the impact to our data designs? Is it even possible?
The design of a data warehouse has been well discussed and debated over the years. The battle between Ralph Kimball and Bill Inmon over the years is legendary. The choice of an Information Factory versus a Dimensional approach continues to be one which all new data warehouses need to consider. In this discussion the choice is really immaterial. Whether you choose either design approach you will still need to consider how the design will be developed. Can we build the design incrementally? Can we minimize the impact of the overall project and minimize regression testing? This is the constant challenge we face when developing data solution at EPAM, especially when using Agile practices to drive our project success. The key is to design and develop once and to evolve the design as you go but there are some key considerations you will need to make when designing to optimize the design and minimize the refactoring which may be required by design changes.

The first item you must consider is using Design Patterns for Data Warehouse modelling techniques. This approach basically says that all objects will be built using templates which can be used to address most of the needs within your design. This means, that similar tables in the design will follow predefined patterns. At the most basic level we predefine what a dimension and a fact will look like. They will include a surrogate key and the various attributes required by each. In addition they will include control fields to allow us to manage how and when data is processed. For more complex facts or dimensions we also provide a template which allows us to support all of the different slowly changing dimensions as well as to manage quickly changing facts, both of these more advanced design methods provide us with the ability to manage the data effectively and consistently. For relationships we look to an approach where we define intermediate tables to manage relationships. We build “bridge” tables for this purpose which provide a reliable manner to relate facts and dimensions to improve performance and extend query capabilities in the future and form a key part of allowing the model to work across multiple subject areas.

The second consideration is to Design with the Future in Mind. In this situation you are faced with the choice of building based only on defined requirements with little consideration for future requirements. In the Agile context this seems like an obvious approach of design what you need when you need it. The concept of Just-in-Time Design is one which has been discussed and developed in the past few years. However when we put this practice the reality is that you want to try and define your facts and dimensions as completely as possible at the time you first design it to ensure that you design for the future needs in addition to the ones you have at the moment. This will result in additional attributes which might only be used in a much later sprint but are defined in order to reduce refactoring. In addition it may be necessary to define additional dimensions so that you will minimize the rework when it comes to adding dimensions to fact tables in the future. The key is to design what you need when you need it and provide as much forward thinking in your object definitions as early in the process as possible.

The final suggestion I would have to future-proof things is to ensure that your data warehouse is designed to support the integration of multiple data sources right from the start. So add additional attributes and ETL functionality which supports this approach. The data warehouse is really all about providing the business with an integrated and reliable solution; therefore you must design with the goal of integration from the beginning.

Ultimately the design and development of a data warehouse requires the data architect and data modeler to look to the future. They need to anticipate data requirements and to try and define that data objects and relationships as completely as possible right from the start and you can avoid the many pitfalls of a data warehouse design by designing with the end in mind while allowing the design to evolve based on business needs.