92% less CPU Usage with a Simple Trick – Teradata PPI Tables
Joining partitioned tables can be quite expensive, especially if partitions don't match or one of the tables is not partitioned. Today I discovered an interesting trick which can be applied to reduce resource usage if joining two tables with matching Primary Indexes and Partitions.
The test setup consists of the following two tables (both tables containing about 50.000 rows):
First, we are executing the following query, which selects all partitions (see DDL's above). We could leave out the WHERE condition, and would get the same result:
The execution plan below shows that both tables are redistributed to a new primary index (SalesId, SalesDate), and a consecutive merge join is done:
In a next step we change the query, by removing the last date from the WHERE condition (i.e. ‘2016-12-31')
The resulting explain plan reveals that the Optimizer switches to a rowkey based merge join (without doing a costly row redistribution):
Afterwards, I compared the resource usage of the following two statements. Both of them are delivering the same result, but one is doing the standard merge join, the second query is designed to “force” the Optimizer into the rowkey based merge join:Like expected, the rowkey based merge join is much cheaper and has a much better run time:
|Logical IOs||CPU Seconds||Spool (GB)|
|RowKey Based Merge Join||91,00||0,02||0,10|
|Traditional Merge Join||272,00||0,24||1,80|
|Reduction in Resources||67%||92%||94%|
This might be a great trick whenever you want to retrieve all partitions of two PPI tables being joined together. When leaving out a WHERE condition covering the partitions you may end up with a traditional merge join. My trick allows to force the Optimizer to do a much cheaper rowkey based merge join.
While I don't understand why the Optimizer is choosing a standard merge join if we select all partitions, this article shows that there is a lot of potential for optimizing joins by manually rewriting your queries!
UPDATE: It looks like the Optimizer avoids in some cases the rowkey based merge join, if the tables are very small. There seems to be an Optimizer rule which says “if the tables are small and all partitions are retrieved, go for a traditional merge join. This trick probably doesn't work with big tables as the Optimizer by default will chose the rowkey based merge join.