Wednesday, February 13, 2013

Master of Your Data using the Database

I recently was involved in a project for an organization who needed one thing. They needed a master customer and master product list to enable cross-organization analysis. This may seem like a simple task; create a single customer and product but it is not simple.

As JFK said about going to the moon, "We choose to go to the moon in this decade and do the other things. Not because they are easy, but because they are hard.", the same is true of customer and product integration and mastering. MDM as a technology and a process is not easy it is hard but it provides so much value in the end that it is worth the journey to achieve.

The challenge of MDM is focused squarely on creating a technical solution which enables the business to automate the process of matching customer and products into a single master list. This can take significant effort to get to the point where the rules you defined for matching are meaningful and effective. 

The project I was involved in required us to create a solution which was cost effective and did not include the use of a matching product like DataFlux or Trillium but was based in the database and ETL tool. Our database of choice was Oracle which provided some SQL extensions to support matching. We implemented the matching within an ETL tool (Talend) which further extended our capabilities which we had in the database. A number of functions were consider and the following Oracle functionality was used in our cleansing and matching approach:
  1. Regular Expressions where used to find patterns and remove and alter to enable a standardization of names and addresses
  2. Equi-joins and other join types to match
  3. Soundex or Metaphone function in combination with other matches to enable fuzzy matches
  4. Jaro-Winkler, Levenshtein and Distance functions for fuzzy matching
  5. ETL Tool Functionality which further extends the base database functionality
 All of these functions can help you to find the right matches in your database and provide functionality to build your own MDM solution where you can leverage the investments you have already made in your database and tool without making a huge investment in software.

I will be presenting this solution at COLLABORATE13 in Denver in April, and this entry should help you as you consider an alternative approach to matching which will be critical to your MDM solution.