Teradata Join Optimization
The following case study shows how we can sometimes improve query performance by a slight change of the query text.
The starting point is below SQL statement. The primary index of both tables is PK (one of the tables is a volatile table, but this doesn't make any difference for our example). One of the tables is partitioned.
When I was executing this query, it became skewed. What I disliked immediately about this query was the join condition:
t02.PK = (CASE WHEN t01.TheCode <> ‘R' THEN t01.PK END)
It is never a good idea to apply functions on join conditions. This should be avoided whenever possible. Functions in joins are a hint that something is wrong with the data model (for example, if 1NF is violated)
There exist several disadvantages with expressions on join conditions:
- The optimizer will not have good statistics available for your expression.
- No AMP-local merge join (direct merge join) can be done, even if both tables have the same primary index (as in our example).
In our test case, there is an additional disadvantage which is revealed by substituting the join condition into the below SELECT statement:
Almost all resulting values of the join condition will become NULL values, i.e. being redistributed to the same AMP!
This is the reason for the high skew when running the query.
Below is the detailed execution plan. Table2 has to be redistributed by the hash code of our complex join expression.
Most of the rows are moved to the same AMP (usually the highest AMP holds the NULL values).
The subsequent join step (hash join) is skewed:
Fortunately, it's quite easy to rewrite the query in a way to avoid data redistribution and skewing:
The new version of the query doesn't require any join expression. It's doing straight-forward sliding windows merge join on the primary index of both tables. No data redistribution is required.
Here is the execution plan:
Finally, let me show you the improvements in resource usage:
|New Version||05.04 11:38||05.04 11:39||12379740||268.383,00||136,52||5.150.474.240,00|
|Old Version||05.04 11:40||05.04 11:46||12379740||437.771,00||178,46||5.152.128.000,00|
Disk IOs were reduced by about 40%, CPU usage by 25%.