Thursday, June 25, 2009

Dealing with Oracle Block Sizes

   When working on a data warehousing project that will be deployed on an Oracle platform, the discussions often turn to the Oracle database block size that should be used.

   We all know that in data warehousing environments one needs to move lots of data. This means that we are probably going to increase the size of the blocks in which Oracle will store data. The question then moves on to which one should we choose? Should we use 8KB, 16KB, 32KB or even 64KB. This is not a simple question to answer and if you ever did a search to find information on this decision; you find the information rare and often esoteric. The decision needs to be one that considers how information will be loaded and retrieved from your data warehouse.

   So when we decide we look at our data usage. We know that we will be loading millions of rows a data via a bulk loader like DataPump; we know this benefits from larger block sizes as the writes will pack more rows into each database block. We know that we are going to be doing updates during the DW process, so if are going to do any lookups then we need consider that these blocks will contain a larger number of rows, which means our reads will also retrieve more information than we really need. As for the reporting perspective; generally this will require reading larger volumes of data for analysis and aggregation. Again this benefits from larger block size by reducing the number of individual reads, which is good. Now we need to consider multi-block read counts. If we have increased the size of blocks and teh multi-block read count, means that we may move large amounts of data through our I/O channels that may cause bottlenecks, so this is another area that we need to manage and be concerned with.

Ultimately there is no right answer, other than it depends. It depends on the exact nature of your data and your data usage. This must be balanced with the server to optimize performance. The last thing we want to do is introduce new bottlenecks. Generally data warehouses will see block sizes that range from 8KB to 32KB, with multi-block read counts in the 8 to 32 range. Bottom line is test what works for you using a real world load to determine what works best for you.

No comments:

Post a Comment