Skewed Teradata Joins – The Initial Situation
Let’s take the following scenario as an example. One table contains currencies, the other table contains the accounts of our customers including the currency in which the account is managed.
This means that the ISO code of the currency is the foreign key of the account table:
CREATE TABLE Currency ( CURRENCY_CD VARCHAR(20) NOT NULL, CURRENCY_NAME VARCHAR(200), ) PRIMARY INDEX (CURRENCY_CD); CREATE TABLE Customer ( CUSTOMER_ID INTEGER NOT NULL, CUSTOMER_NAME VARCHAR(255), ... CURRENCY_CD CHAR(20) ) PRIMARY INDEX (CUSTOMER_ID);
Join using the Column CURRENCY_CD requires that both tables have this column as their primary index and that the rows of both tables are available on the relevant AMPs.
As you can see immediately from our example table for accounts, most accounts have no currency assigned (NULL). Therefore, the join will be skewed.
Even though current versions of Teradata make appropriate optimizations to reduce this problem (e.g. by re-hashing the rows of well-distributed values, but copying rows with a NULL value to all AMPs), this problem still persists.
Especially if the optimizer does not have the corresponding statistics about biased values or only has them obsolete.
A Possible Solution To This Problem
One possible solution is to replace the NULL value with a value that is as distinct as possible.
In our case, for example, the CUSTOMER_ID can be used. We could simply cast it to a CHARACTER column, add a “special string’ in front of it, and write it into the Column CURRENCY_CD:
A disadvantage of this method, however, is that NULL values are no longer recognizable as such. So you might need some logic in the queries.
Do you want to know everything about joins in Teradata? Then read this article: