Thursday, March 24, 2011

Open Season on the Database

For years we have contemplated open-source databases and how they fit into our overall database technology strategy. The complexity of what is the right choice for an organization has many answers. It is no longer a simple question of choosing between Oracle or Microsoft or even IBM; no it’s a choice that needs to include MySQL (unless that counts as Oracle?), Postgres, or now Hadoop and Cassandra. The choices are as complex as the systems we need to support.  From my perspective it is now open-season on the database. The choice is no longer straightforward.

The emergence of “The Cloud” and the need to distribute more data more effectively has added new pressures. The need to analyze massive amounts of data more efficiently is a must-have. We look to the Oracle’s of the world they respond with a system that is scalable, efficient and leverages the power of Oracle to enable the business. The Oracle database on an Exadata server is a solid enterprise-grade solution that will support this need. It is robust and powerful, but it is costly. So organizations who need to similar functionality but want to do it using open-source technology or some lower cost-model of software now also have options. At Thoughtcorp we are leading providers in many areas of technologies and our team has varied knowledge and much of it focused on today’s latest technologies including wireless, apps and of course using open-source technologies to achieve their goals. The data group has seen this shift as well and now I am working on how products like Hadoop, which can be used for massive data analysis. It is like Google file system (GFS) and allows for the collection and analysis of data. From Hadoop other solutions have emerged. Hive provides a easy to use tool to work with Hadoop to simplify analysis. Now Hadoop has been combined Cassandra and open-source database from Facebook. What does this give us? A real-time database with big data capabilities. Wow…that’s a mouthful.

Of course these open solutions come today with some operational risks due to single possible points of failure. of course if you can manage the risk. You may have an alternative to today’s  enterprise databases. These new data options provide us more with choice and options and in a world where you consider…. If it’s good enough for Facebook and Google, is it something I should consider?

Tuesday, March 8, 2011

Building Data Warehouses for the Masses

Often data warehouse designers and architects are often accused of building systems which do not always serve the enterprise but only focused portions of the organization. I have seen this occur at numerous clients who did not invest in a long-term vision and long-term data strategy.

So how do we avoid this trap and ensure that we produce a system which does not go “silo” but sets a course to meet current and future needs at the enterprise level and not departmentally. I was recently working on a presentation about data warehousing design and the question of Ralph Kimball (Dimensional) versus Bill Inmon’s (Normalized) design approach of which is better. The question is that both work, if you keep to the big picture. Design your high-level design upfront and adhere to design and development standards, It is about setting a clear course for DW design and never allowing for an independent or silo’ed solution to be developed. The key is to always look for integration opportunities. Add to your core data warehouse; don’t built a new structure which is not tied in and used conformed dimensions.

Bill Inmon, said in referring to Dimensional modeling that “A 1000 minnows do not make a whale”. Build with a great purpose in mind  that you are building an integrated, stable data repository to support business reporting and analytics.