In my (mainly) Oracle Data Warehouse centric world I have always appreciated the VLDB features showcased in the documentation; as documentation goes these two publications are more readable than most: VLDB Guide and Data Warehouse Guide. I always find new stuff in each new release of the database . However, one of the great things over the past (almost) two decades has been table partitioning; true, other vendors do similar things now, but Oracle’s way always seemed easy to adopt and performed well. In fact before familar Range – Hash composite partitioning came out in Oracle 8i there was the partition view feature; this is now depreciated, but last time I looked still worked (Oracle removed the init parameter for this but left it set as true) however this needs to be verified if you want to use it and is almost certainly not supported officially. Putting my slippers and dressing gown on for the “old man that did DWH development fireside chat” mode I can confess that I quite liked the simplicity of partition views and the way it ‘eliminated’ unneeded tables from the queries – in case Google has forgotten about this feature (it hasn’t); we create individual tables for the data ‘partitions’ then create a UNION ALL view over them all that includes filter predicates specific to each partition and optionally supported by check constraints on each table – it’s this filter that allows the elimination to take place.
Now assuming my usual persona of Alfred, butler to Analytics Batman, I will continue.
Partitioning data is part of the “divide and conquer” strategy for data warehousing. Although the DBA side of the house look at partitions from predominately a data life-cycle management (DLC) / back up and recovery viewpoint we should also consider how we partition for query performance. For DLC we can look at highly compressed ‘historic’ data, tiered storage (old data on slower disk, active data on faster storage / in-memory) we might change partition sizes for historic data too, there is no requirement that partitions are uniform over time. For backup and recovery we might consider making old partition tablespaces readonly. Partitions are about making performance better, through reducing the amount of data to scan and through improving how we might access it in parallel query.
With each release of the database the capabilities of partitioning increases and coupled with data clustering and zonemaps we can do some pretty powerful things. For example in our SHIP_TO fact we force customers in the same state to have their records close to each other even if state is not explicitly keyed on the fact table.
I think that one of the great partitioning innovations is being able to automatically create partitions based on a numeric or date interval. This works with deferred segment creation and will create a partition only when there is data to insert into it. Gone are those hassles of pre-creating partitions in advance of the batch. However there are some quirks that we need plan for:
- Partitions get created with system generated names.
- partitions are not created until they have at least one row to insert into it.
Not knowing the name of the partition is not as much a problem as we might first think. Oracle has given us a new syntax to manipulate a partition based on a value of partition key. This “PARTITION FOR” syntax allows us to do things such as exchange partitions or truncate partitions without knowing the name of the partition, we could even use it to rename the partition (but more on renaming below). Without this extension clause it is more problematic to obtain the partition name because the high limit value of the partition is held in a LONG column in the data dictionary and is not so easy to interpret.
The second point is down to deferred segment creation. When we issue ALTER TABLE commands on a partition, it has to exist before it can be manipulated. True, it is possible to generate a dummy row to force a segment creation before we truncate or exchange, but to my thinking that goes against the spirit of RANGE INTERVAL partitioning. What we should do here is to issue the command and anticipate it failing; with tools like ODI we just set the ‘ignore errors’ check box on any step likely to fail through missing segments, or with more code, trap and handle the error in the command step.
In a data warehouse we often use bitmap indexes on the dimension key columns of a fact table. For partitioned tables these need to be LOCAL to the partition. When we are loading the table it is a common practice to make these bitmap indexes unusable so as to remove the update index overhead until we have completed the load; for non-partitioned tables this is simply set the index unusable and rebuild it when done. But for partitioned indexes we would traditionally mark unsuable only the index partitions affected, do the load and then rebuild just those partitions afterward. There is another way we can achieve this with partition exchange loading but for this blog I’ll only consider direct appending to a partition.
The problem we have is that the PARTITION FOR syntax does not work with ALTER INDEX statements that we use to set an index partition unusable. we can’t say:
ALTER INDEX DWH.SALES_BMX19 modify partition for (date '2018-03-01') unusable;
However all is not lost, we can get a little inventive. First off let’s look at those partition names in the data dictionary
select PARTITION_NAME from Dba_IND_PARTITIONS where index_owner = :OBJECT_OWNER and index_name = :OBJECT_NAME;
Those names look similar to those used for the table itself, so next join to DBA_TAB_PARTITIONS to verify that LOCAL BITMAPS on INTERVAL PARTITIONED tables are stored in partitions with the same name:
with index_bit as (
select PARTITION_NAME from Dba_IND_PARTITIONS where index_owner = :OBJECT_OWNER and index_name = :INDEX_NAME),
table_bit as (
select PARTITION_NAME from Dba_tAB_PARTITIONS where table_owner = :OBJECT_OWNER and table_name = :TABLE_NAME )
from index_bit full outer join table_bit on index_bit.partition_name = table_bit.partition_nam; -- full outer to expose any non-matches
Perfect! all partitions match. This means we can use the partition name for the table partition as the index partition name. If only we knew what that was. However, that is something simple to find, and again from the data dictionary.
This time I am finding a single row in my table partition and using that to identify the partition name from the rowid
where data_Object_id = (
select dbms_rowid.rowid_object(rowid) from DWH.SALES
partition for (DATE '2018-03-01')
where rownum = 1);
Now we have the partition name for the data for that date we can use that to build the dynamic SQL we need to set the index partitions to unusable.
Rebuilding the indexes after the data load is easier. We just look for unusable index partitions in DBA_IND_PARTITIONS and rebuild them – I use DBA_INDEXES to refine the selection so as to rebuild the bitmaps for a single table at a time.
You may think that renaming the system generated table partition name is a solution for this but that is not the case as renaming the data partition does not rename the associated index partition.
- We can’t rename a partition before it is created
- Adding data
- creates a system generated partition name (and partition)
- and at the same time creates the local bitmap partitions (with the same partition name as the data)
If we rename the table partition at this point we break the relationship between table and index names. If we must rename table partitions (why??) we should rename our index partitions at the same time to keep a relationship in place