Monday, May 27, 2013

Designing a Future-Proof Data Solution



The challenges we face today when designing solutions is how do we avoid the pitfalls of constant design changes? How can we reduce the impact to our data designs? Is it even possible?
The design of a data warehouse has been well discussed and debated over the years. The battle between Ralph Kimball and Bill Inmon over the years is legendary. The choice of an Information Factory versus a Dimensional approach continues to be one which all new data warehouses need to consider. In this discussion the choice is really immaterial. Whether you choose either design approach you will still need to consider how the design will be developed. Can we build the design incrementally? Can we minimize the impact of the overall project and minimize regression testing? This is the constant challenge we face when developing data solution at EPAM, especially when using Agile practices to drive our project success. The key is to design and develop once and to evolve the design as you go but there are some key considerations you will need to make when designing to optimize the design and minimize the refactoring which may be required by design changes.

The first item you must consider is using Design Patterns for Data Warehouse modelling techniques. This approach basically says that all objects will be built using templates which can be used to address most of the needs within your design. This means, that similar tables in the design will follow predefined patterns. At the most basic level we predefine what a dimension and a fact will look like. They will include a surrogate key and the various attributes required by each. In addition they will include control fields to allow us to manage how and when data is processed. For more complex facts or dimensions we also provide a template which allows us to support all of the different slowly changing dimensions as well as to manage quickly changing facts, both of these more advanced design methods provide us with the ability to manage the data effectively and consistently. For relationships we look to an approach where we define intermediate tables to manage relationships. We build “bridge” tables for this purpose which provide a reliable manner to relate facts and dimensions to improve performance and extend query capabilities in the future and form a key part of allowing the model to work across multiple subject areas.

The second consideration is to Design with the Future in Mind. In this situation you are faced with the choice of building based only on defined requirements with little consideration for future requirements. In the Agile context this seems like an obvious approach of design what you need when you need it. The concept of Just-in-Time Design is one which has been discussed and developed in the past few years. However when we put this practice the reality is that you want to try and define your facts and dimensions as completely as possible at the time you first design it to ensure that you design for the future needs in addition to the ones you have at the moment. This will result in additional attributes which might only be used in a much later sprint but are defined in order to reduce refactoring. In addition it may be necessary to define additional dimensions so that you will minimize the rework when it comes to adding dimensions to fact tables in the future. The key is to design what you need when you need it and provide as much forward thinking in your object definitions as early in the process as possible.

The final suggestion I would have to future-proof things is to ensure that your data warehouse is designed to support the integration of multiple data sources right from the start. So add additional attributes and ETL functionality which supports this approach. The data warehouse is really all about providing the business with an integrated and reliable solution; therefore you must design with the goal of integration from the beginning.

Ultimately the design and development of a data warehouse requires the data architect and data modeler to look to the future. They need to anticipate data requirements and to try and define that data objects and relationships as completely as possible right from the start and you can avoid the many pitfalls of a data warehouse design by designing with the end in mind while allowing the design to evolve based on business needs.


Wednesday, February 13, 2013

Master of Your Data using the Database

I recently was involved in a project for an organization who needed one thing. They needed a master customer and master product list to enable cross-organization analysis. This may seem like a simple task; create a single customer and product but it is not simple.

As JFK said about going to the moon, "We choose to go to the moon in this decade and do the other things. Not because they are easy, but because they are hard.", the same is true of customer and product integration and mastering. MDM as a technology and a process is not easy it is hard but it provides so much value in the end that it is worth the journey to achieve.

The challenge of MDM is focused squarely on creating a technical solution which enables the business to automate the process of matching customer and products into a single master list. This can take significant effort to get to the point where the rules you defined for matching are meaningful and effective. 

The project I was involved in required us to create a solution which was cost effective and did not include the use of a matching product like DataFlux or Trillium but was based in the database and ETL tool. Our database of choice was Oracle which provided some SQL extensions to support matching. We implemented the matching within an ETL tool (Talend) which further extended our capabilities which we had in the database. A number of functions were consider and the following Oracle functionality was used in our cleansing and matching approach:
  1. Regular Expressions where used to find patterns and remove and alter to enable a standardization of names and addresses
  2. Equi-joins and other join types to match
  3. Soundex or Metaphone function in combination with other matches to enable fuzzy matches
  4. Jaro-Winkler, Levenshtein and Distance functions for fuzzy matching
  5. ETL Tool Functionality which further extends the base database functionality
 All of these functions can help you to find the right matches in your database and provide functionality to build your own MDM solution where you can leverage the investments you have already made in your database and tool without making a huge investment in software.

I will be presenting this solution at COLLABORATE13 in Denver in April, and this entry should help you as you consider an alternative approach to matching which will be critical to your MDM solution.

Tuesday, October 23, 2012

Oracle OpenWorld 2012 – What’s New…..

Last week I attended Oracle OpenWorld. This was at least my 10th time attending the event and it continues to grow and change as Oracle does. One thing that does not change is San Francisco, which is always amazing and this year was sunny and warm.

20120930_183959

The conference is the annual gathering hosted by Oracle and is the place where the company gets to talk about what is new and what is influencing our businesses. So Oracle made many announcements during the week related to Cloud, Big Data, new hardware and a new release of the Database. Each is hot these days and Oracle continues to bring new and updated offerings to the market to meet this changing landscape.

In the Big Data space as with the Cloud space this year was a year to encourage the adoption and use of the technology. In the area of Cloud Oracle offered plans for people to migrate from in-house systems to the Cloud. They discussed strategies on how to make the transition as easy as possible. The challenge I heard about companies moving to the Cloud has been the fact that the systems people have are old or have been significantly customized and the move is not one which is simple or straightforward. This move for some companies will not be as easy as was described.

The big news for news for me was the fact that Oracle had officially released a new version of the database; version 12c is on the way. This new version has a number of enhancements the biggest for me was the concept of the pluggable database. The pluggable database is a feature which provides significantly better support for databases to be better able to react to hardware, platform and version changes. The pluggable database can be easily moved to another container database which can have many pluggable database within it. Of course Oracle made other changes to the new version of the databases but this was the most significant as it changes the underlying architecture of the database

Of course as usual the big buzz was about Big Data. Oracle continued to sell the concept and help customers to see how Big Data can help. This year the idea transitioned from theory to practice and experiences. People were now discussing use cases (as I did during my Big Data presentation). The why is moving to the how. Below is Andy Mendelsohn, Oracle Senior VP Databases telling us about the Oracle stack for Big Data and how the Big Data Appliance can help.

image

Overall the event was the usual offering for Oracle OpenWorld which helped many to better understand what is coming and how we need to get ready for it.

Tuesday, September 11, 2012

My IOUG is coming to Oracle OpenWorld

It’s that time of the year again. The summer is slowly coming to a close. The evenings are getting colder and the days shorter. The other thing that arrives at this time of the year is Oracle OpenWorld; the annual gathering of Oracle customers hosted by Oracle. It all starts on September 30th, when I and lots of other Oracle professionals will descend upon San Francisco for the annual event.

I have to admit one of the great highlights of the week, is the very first event of the week. This is the IOUG  at the User Group Sunday event where the user group starts the week with presentations , discussions and panel on the deepest parts of Oracle’s technology, where users share their stories and experiences. I will be presenting a seminar about Big Data: the Future is Now. OpenWorld is the place where I get to meet old friends and colleagues and hang out with my cool cousin (who lives in San Fran), it is one of two gatherings of users and it is the place I renew many friendships. It is like a geek pilgrimage. 

The remainder of the week is all about Oracle. It is at OOW where we get to hear from Larry Eliison and listen to his vision for the future. We hear about some of the new technologies which will become part of our fabric in our future. I remember hearing about Big Data a few years ago as a concept and now it is becoming mainstream. I will be speaking during the week about how Agile has helped EPAM to deliver Big Data projects; a very exciting topic these days to allow for effective creation of data and reporting solutions. And of course there are the networking events… this year we even get to see Pearl Jam. And then some Oracle Music festival which includes Macy Grey. How do they do it?

So why not come by and see all of us in the user groups and become part of the fun?

Here are Ian’s Top 5 Benefits of the IOUG at OpenWorld

5. Best directions to sessions
4. IOUG helps people separate reality from hype… after 5 drinks.
3. Get to finally meet TV star.. John Matelski. Looks like he may be the next host of Meet The Press!
2. Coolest t-shirts
1. Special IOUG lines at all food and drink counters for all OpenWorld events!

So I hope to everyone at the event. You will find the IOUG booth at Moscone West in the user group pavilion. See you there.

Thursday, May 24, 2012

A Business Intelligence Adventure

There are times in your life where you get a chance to experience something different and exciting and last week was one of these experiences which I will not soon forget. Last week I had the opportunity to be part of a Business Intelligence event being held in Minsk, Belarus by my new company EPAM.

The chance to go half-way around the world to speak about my favorite topics was at the same time exciting as it was scary. I travel a lot for work and pleasure but this was different. The countries of the former Soviet Union have always held a special connection with me; as my grandparents were from Latvia and the Ukraine and Minsk was right in the middle of both. I didn’t know what the trip might hold; would I be able to get around without speaking any Russian? Would they like what I had to say? At least we had some commonalities like hockey, the weather and our love of data. All my fears quickly dissipated once I finally left the airport. The Minsk airport is still a bit of a throwback to the days of Soviet rule.

20120516_063251The airport has only 6 gates and no lines. When you arrive in Belarus you need to purchase medical insurance ($2 Euros) and of course meet with Passport control to get final clearance into the country. This was the first time I travelled to a country which required a visa, and although before my trip I was anxious about entering the country it was a smooth trip to Minsk

.

I was headed off to visit the team at EPAM. I had a trip which took me through the countryside and into the city. I was struck by how much the Minsk area looked like any Canadian place. The forests of white birches was a welcome site. The cars that they drive there are no different from ours, but what was different was the grandeur of the architecture and how it seemed like a modernized version of the old Soviet Union. The streets in the city core are wide and grand. Below is an image of Independence Square which houses the Belarusian government and a huge shopping centre which is right below the square.

Minsk_pana1This is what I expected, these are the type of buildings which I pictured in my mind. The biggest realization was that the shops were fully stocked with goods, much like we have in Canada. The brands might be different and you can buy vodka for less that $10, but this is a country where success is coming as they evolve from their modest past. This is a country that has welcomed the new age and are working to bring and grow.

The visit to EPAM and speaking at a BI event in Minsk was my reason for being there. EPAM is company with a strong knowledgebase in many technical areas. They are a company of 9,000 professionals who deliver top-notch solutions and now with the purchase of Thoughtcorp they can begin to show that strength in Canada along with our team.

20120517_101723So back to the experience. I presented at the first <epam> BI conference held in Minsk. It covered subjects such as what is data and why is it important to how Oracle and Microsoft can support data solutions for it’s customers. It was a great time where I discussed the future of data and how important it is becoming today. Where businesses which embrace data and fact-based decision making can make a significant impact to an organization’s success. The audience was great and the questions where thoughtful and interesting. It was a great experience which I will not soon forget.

20120518_173910The next day was a visit to <epam> and I had the chance to talk about my experiences using Oracle and how we run data projects. I was again struck by the quality of the people I met. This was a strong team which wanted to learn more and get even better. We discussed Oracle direction around data and how we can get the most from our database investments. This is a picture of the offices in Minsk, with the lead of the Oracle Performance team, Andrei. It was a awesome experience and was a great introduction to the people and skills which EPAM bring to the market. I thank them all for letting me be a part of it all.

Of course no trip to Minsk would be complete without some comments about the food and drink. Belarus is known for some great vodkas and this trip I got to experience many choices. Straight vodka (awesome), cranberry vodka (awesomer) and some vodka made with Bison grass (a spicy, smoky awesomeness). The food was also an experience and reminded me of my childhood when my family would make very similar dishes. The potato pancakes were a throwback to the days when fried food was good for you. All of the food here is made from scratch and I am told it is all organic.

Overall, the trip to Minsk was an experience which I will never forget. The team at EPAM taught me a lot as well, it was a chance to see how a country like Belarus can rise to become a modern and technically advanced country, Belarus is a place with few natural resources but one thing they do have is a lot of smart people doing some very innovative things. It was great to experience and I look forward to my next visit to Minsk… after all I forgot my sports jacket in a bar in Minsk, so I need to go and pick it up!

Monday, April 9, 2012

Delivering Data Projects With Agility and Success

The challenge in delivering data projects has always be wrought with dangers. Data projects tend to be large and encompass many aspects of the organization. As a result the time to build a complete data solution can take months and years. So often these projects found that at the end of a long waterfall-style project that the results were less than expected. The system may have hit many of the business requirements, but it missed on others, while new requirements have not even entered into delivery process. It is said that 50% of data warehouse fail, while other studies have shown this to be even higher. In my travels, I would say that we hit the target most of the time, but usually it takes longer to achieve then expected. So why should we then consider Agile?

Agile is the approach which is based on four basic values which were defined in the Agile Manifesto. These are:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

You should note that we value both sides of the equation, but that we value the ones to the right more. These basic values provide us with basis for working in a collaborative environment which can focus on incremental working software.

How does Agile this help us to be more successful in data projects? The project approach for data which we follow at Thoughtcorp focuses on using Agile. This results in an approach which delivers the solution incrementally. By understanding the big picture of data the need of an organization we can divide a project into iterations which build upon each other while delivering working software. This is done via a prioritization process. In the Agile world this is known as Kanban Development. In this approach we continually review our priorities and check to see if the business now has new requirements. This allows the project to alter its trajectory based on real needs which are now better understood. This does result in improved project performance and a better solution for everyone.

The basic answer is that it does help. It has shown that data projects can be delivered faster and more effectively. We have seen that our productivity is increased by 27% and that defects are reduced by 35% versus typical data projects. In addition the number of features delivered was higher than anticipated. All of this resulted in a project which included over 300 data objects and 150 reports and was delivered successfully in 8 iterations. Below is our project wall looked like as we were working on things, as we had a lot of collaboration take place:

image

This project was one example of how Agile made data work. We learned along the way and perfected our approach. The great thing about Agile is that you make it work for your team and your projects, but you have to invest the time and effort to make it work.

And don’t forget COLLABORATE 12 is coming up in Las Vegas is just a couple of weeks. I hope to see everyone there. I will be speaking all about Big Data and how it fits into today’s data ecosystem.

Monday, January 9, 2012

Entering 2012 With Promise and Concern

The new year has brought with it another year of change and evolution. The economies of the world are changing and making all of us make important decision on where we are going and how best to get there.

The economic crisis in Europe and the rest of the World keeps us on the edge of uncertainty. We are living today in a World with many questions and in many ways a lack of direction. The big concern becomes; where do we go from here. What is the next big thing? And how can we as a civilization contribute to a better overall community.

Well, I can’t solve the World’s issues in this blog, but at least we can look at how technology can help in the coming year and what we should expect based how technology is driving employment and the change to our working environment.

I read an article recently about the 5 Hardest Jobs to Fill in 2012 and what I discovered was that most of them are related to technology and ones knowledge of how to use the technology which is available. The list includes:

  1. Software Engineers and Web Developers
  2. Creative Design and User Experience
  3. Marketing
  4. Product Management
  5. Analytics

All of these jobs show us what is hot. These jobs need people who are well versed in technology and the technology underlying these jobs is hot as well. Consider when we discuss Marketing and Analytic jobs, they both need people who can look at data and analyze it to draw out competitive differentiators. In the area of systems development we see the rise of the Mobile App, and understand that there is a shortage in this latest development technology. If you have tried to find an HTML5 developer you are well aware that they are few and far between.

Technology today is driving these needs. The database appliance as well as Big Data technology is making it easier for companies to collate there data into information repositories which provide significant value. Data is being delivered to users in effective and intuitive ways which enable the organization to achieve growth in these uncertain times.  The mobile device is changing the way we interact with information and making it more readily available when and where people want it. Times are changing and technology is the driver for our future. So take the time to see what is happening today and envision where you and your organization will take it in the coming years in order to survive and thrive.

Tuesday, October 4, 2011

Big Day for Big Data at Oracle OpenWorld

When you come to Oracle OpenWorld you realize that the world is changing. I find that when Oracle embraces a technology they may not be first but they are there when it counts and that is true for Big Data and the Oracle ecosystem.

The Big Data is big news and Oracle has shown it’s ready to take on the challenge. Monday, Andy Mendelsohn, SVP Database Server Technologies, spoke to us about Oracle’s Big Data Strategy.

IMG-20111003-00016

He discussed what Big Data is and how it is used. The explosion of information today is resulting in major changes to technology. We need to understand NoSQL databases, we need to embrace Hadoop and we need to change the way we think to find the information nuggets which will support business advancement.

So what has Oracle done…. let me count the ways

  1. The Bog Data Appliance
  2. The Oracle NoSQL database
  3. The Oracle Loader for Hadoop
  4. Oracle Data Integrator for Big Data
  5. R Enterprise

This is significant for us in the data-space, the time for Big Data may be now or soon. It shows me that the technology  has been validated and it is time to start looking into this. This is very exciting technology and considering the mass of information we are collecting it will be important for us to use this data to achieve a competitive advantage.

Next April 2012 at COLLABORATE12, I will be running a bootcamp on Big Data, and it will be the first place that real-world experiences in Big Data will be shown.

These are Big times for Big Data!

Monday, September 12, 2011

If Someone Says “I Think”; Tell Them to Prove It! – IOUG Real World Performance: A Review

Seems like an odd statement. We always hear people say “I think…” and then they go on to tell you what they have hypothesized. Now you wonder is this story factual? Does it REALLY work? It is then time that you need to ask them to prove it. We don’t often challenge our peers to “Prove It”. We accept thoughts and then experiment on our own to see if this thought proves to be correct.

This past Friday I had the chance to attend the IOUG’s Real World Performance seminar which was held in Toronto. It was at this seminar led by 3 of most knowledgeable Oracle people in the world, where the proof showed us how to change the way we think. As they say there is no reason for things to run slowly today, but that instead that people have made the choice to run this way and not take advantage of new technology and new methods.

Andrew Holdworth, Tom Kyte and Graham Wood, led this amazing day of proof.  The image below is a picture from my chair… of the 3 of them up front in the room(as they were all day)

image

It was a day where the 3 Oracle database performance gurus showed us a new way to look at performance. They discussed data warehouses and operational systems. So what were some of teh highlights:

  • Showed today’s data warehouses have undersized CPU’s and insufficient I/O
  • Improve data loading, as data size will continue to expand
  • Stop using SQL*Loader, it’s time has passed. Consider external tables
  • Read compressed data if possible
  • Statistics are critical to success
  • Make sure cardinality is accurate in table statistics
  • Use SQL*Monitor to investigate performance of SQL
  • Manage your database and system resources. Use Instance Caging if needed
  • Used set-based operations as often as possible
  • Check your SQL statements for issues

The biggest thing takeaways I came away with is that today we should try to achieve full CPU utilization, we should optimize how we transfer data into a data warehouse, we need to make sure our SQL statements work as expected and that indexes and partitions can be your friends and your enemies.

It is amazing to me to see how we things have changed over time. Today we need to manage our data better, our databases more effectively and the improve the process of loading and retrieval. The tools are there for you, we just need to use them. As Tom Kyte said, “Why would people choose to run their database slowly?” 

There are more of the IOUG Real World Performance being held this year. If you get the chance to attend, you should; as it can change your life or at least the life of your database.

Tuesday, August 30, 2011

What do Data People do?

I find it an interesting time in the days of data. I see how companies are changing and how they are embracing the data revolution and some others are not. What I do see is that there is a thirst for data and for the consumption of data in a meaningful way.

The terminology we use is changing along with the technology we use. Last week someone asked me what I did. I said I was a data architect; this was caught with a blank stare. So I then change my tact and tell people if do “Business Reporting and Analytics”. This they seem to understand.. at least the reporting part. For most people analytics is some form of math which they may be learned in a Statistics class, but didn’t really understand how it worked. The key was they passed the test. For most they consider it something they heard about in the news.

So it got me thinking, what am I? What is my job? I think there are numerous terms that might describe what I data people do today or maybe we just want to make the job of pushing numbers all day sound sexy. Here’s what I and other have come up with:

  • Data Warehouse Architect
  • Data Guru
  • Information Technician
  • Data Scientist
  • Data Analyst
  • Analytics Geek
  • Information Shark
  • Information Jockey

The ideas are endless just like the information we use. As the non-data people begin to see the power of data as information becomes served up to the everyday person, I will need to explain what I do. No title will summarize it well. Basically I tell people I collect data and draw nice pictures with the data and that always seems to make people happy, which is a good thing.

Don`t forget to join me at Oracle OpenWorld in October for my presentation about Big Data and the challenges and outlook.

Tuesday, August 9, 2011

2011 IOUG/Oracle Real World Performance Tour in Toronto – Not Your Regular Seminar

There are some training events people should make every effort to attend because they are valuable and unique. This upcoming Toronto seminar is one of those occasions where you need to find some time. I hope many of you can join me at the upcoming IOUG Real World Performance Tour as it arrives in Toronto on September 9th, 2011.

image

Tickets are on sale now for 2011 the Toronto stop of the Oracle Real World Performance Tour – featuring Tom Kyte, author of the famed AskTom blog; Andrew Holdsworth, head of Oracle's Real World Performance Team; and Graham Wood, legendary Oracle Database performance architect.

Buy Tickets:

Friday, Sept 9 - Toronto

Radisson Toronto East

BUY TICKETS

Sound like an ordinary workshop? Think again.

This interactive performance engineering event features the rivaling perspectives of three Oracle rock stars and dueling screen projector presentations for a fun and different educational experience. Get a sneak peek by checking out the video from the first leg of the tour.

Past participants have praised the Real World Performance Tour for:

“Insight into how we handle different systems” Arizona fan review

“Got some ideas how to improve performance even without upgrading to 11 g” L.A. fan review

“Tuning nowadays vs. two versions ago; very reasonable price” L.A. fan review

Oracle Real World Performance Tour

Jam Sessions | 9 – 5 p.m. | $175 IOUG Members | $225 Non-Members

Discounted rates for COLLABORATE 11 attendees
All IOUG COLLABORATE 11 attendee will receive discounted access to these world-class experts - enter the code you received via separate email to get the special rate of $150 USD!

Discounted Group registration for 3+ Attendees
Register at least 3 days prior to the event (no onsite bookings)
www.ioug.org/rocks

image

Friday, August 5, 2011

Big Data is Curious

I was watching a TV show about Studs Terkel, a man who wrote about American life. His story is an interesting one. One statement I heard on the show was:

“Not everyone has a depth of curiosity.
And not everyone has depth of understanding”

I found this a interesting comment and one that I relate to in my everyday world. It is so true that data is expanding faster than our ability to analyze it. This is where Big Data comes in and together with an Agile approach to development can form and important part of a data solution. The ability to collect more structured and unstructured data gives us the ability to feed our curiosity. It is imperative that today’s analysts have the same curiosity as the ancient explorers. Curiosity is opportunity to learn; to discover, this is where real innovation happens. We must use Big Data to gather information to feed the need for understanding. We need to find meaning in this information and exploring it in a dynamic way can support this need. So we look to Big Data to provide this sandbox of data for analysis and knowledge.

There are concerns with Big Data which we also need to address. Security is one of the biggest concerns for most organization. So now we need to look at Big Data and how it is deployed. Often we look to create Clouds of data, but is this data secure? Is this data protected? The danger with collecting more information is how and what to protect? This is a new challenge and if we are to succeed in getting data more accessible and more useable we also need to ensure it is protected. We are seeing that as big data matures and Hadoop continues to become mainstream, we are seeing products needed to support these requirements.

So as the technology advances and our curiosity grows we will be able to create solutions which will provide a robust ability for analysts to gain business understandings.