The challenges we face today when designing solutions is how
do we avoid the pitfalls of constant design changes? How can we reduce the
impact to our data designs? Is it even possible?
The design of a data warehouse has been well discussed and
debated over the years. The battle between Ralph Kimball and Bill Inmon over
the years is legendary. The choice of an Information Factory versus a
Dimensional approach continues to be one which all new data warehouses need to
consider. In this discussion the choice is really immaterial. Whether you
choose either design approach you will still need to consider how the design
will be developed. Can we build the design incrementally? Can we minimize the
impact of the overall project and minimize regression testing? This is the
constant challenge we face when developing data solution at EPAM, especially
when using Agile practices to drive our project success. The key is to design
and develop once and to evolve the design as you go but there are some key
considerations you will need to make when designing to optimize the design and
minimize the refactoring which may be required by design changes.
The first item you must consider is using Design Patterns for Data Warehouse modelling
techniques. This approach basically says that all objects will be built using templates
which can be used to address most of the needs within your design. This means, that
similar tables in the design will follow predefined patterns. At the most basic
level we predefine what a dimension and a fact will look like. They will
include a surrogate key and the various attributes required by each. In addition
they will include control fields to allow us to manage how and when data is
processed. For more complex facts or dimensions we also provide a template
which allows us to support all of the different slowly changing dimensions as
well as to manage quickly changing facts, both of these more advanced design
methods provide us with the ability to manage the data effectively and
consistently. For relationships we look to an approach where we define
intermediate tables to manage relationships. We build “bridge” tables for this
purpose which provide a reliable manner to relate facts and dimensions to
improve performance and extend query capabilities in the future and form a key
part of allowing the model to work across multiple subject areas.
The second consideration is to Design with the Future in Mind. In this situation you are faced
with the choice of building based only on defined requirements with little
consideration for future requirements. In the Agile context this seems like an
obvious approach of design what you need when you need it. The concept of
Just-in-Time Design is one which has been discussed and developed in the past
few years. However when we put this practice the reality is that you want to
try and define your facts and dimensions as completely as possible at the time
you first design it to ensure that you design for the future needs in addition
to the ones you have at the moment. This will result in additional attributes
which might only be used in a much later sprint but are defined in order to
reduce refactoring. In addition it may be necessary to define additional
dimensions so that you will minimize the rework when it comes to adding
dimensions to fact tables in the future. The key is to design what you need
when you need it and provide as much forward thinking in your object
definitions as early in the process as possible.
The final suggestion I would have to future-proof things is
to ensure that your data warehouse is designed to support the integration of multiple
data sources right from the start. So add additional attributes and ETL
functionality which supports this approach. The data warehouse is really all
about providing the business with an integrated and reliable solution;
therefore you must design with the goal of integration from the beginning.
Ultimately the design and development of a data warehouse requires
the data architect and data modeler to look to the future. They need to
anticipate data requirements and to try and define that data objects and
relationships as completely as possible right from the start and you can avoid
the many pitfalls of a data warehouse design by designing with the end in mind
while allowing the design to evolve based on business needs.