Teradata Joins – A Simple Guide To Avoid Skew

Roland Wenzlofsky

October 21, 2019

minutes reading time


Skewed Teradata Joins – The Initial Situation

Let’s take the following scenario as an example. One table contains currencies. The other table has our customers’ accounts, including the currency in which the account is managed.
This means that the ISO code of the currency is the foreign key of the account table:

CREATE TABLE Currency
(
   CURRENCY_CD  VARCHAR(20) NOT NULL,
   CURRENCY_NAME VARCHAR(200),  
) PRIMARY INDEX (CURRENCY_CD);

CREATE TABLE Customer
(
  CUSTOMER_ID  INTEGER NOT NULL,
  CUSTOMER_NAME VARCHAR(255),
  ...
  CURRENCY_CD CHAR(20)
) PRIMARY INDEX (CUSTOMER_ID);

CUSTOMER TABLE

CUSTOMER_IDCUSTOMER_NAMECURRENCY_CD
1 Nina LoweryEUR
2 Alexia NealUSD
3 Kyla ChanNULL
4 Alesha FerrellNULL
5 Cara AdamsNULL
6 Abigail LarsenNULL
7 Amie MasseyNULL

CURRENCY TABLE

CURRENCY_CDCURRENCY_NAME
EUREuro
USDUS Dollar
AUDAustralian Dollar
HUFHungarian Forint

Join using the Column CURRENCY_CD requires that both tables have this column as their primary index and that the rows of both tables are available on the relevant AMPs.

As you can see from our example table for accounts, most accounts have no currency assigned (NULL). Therefore, the join will be skewed.

Even though current versions of Teradata make appropriate optimizations to reduce this problem (e.g., by re-hashing the rows of well-distributed values but copying rows with a NULL value to all AMPs), this problem persists.

Especially if the optimizer does not have the corresponding statistics about biased values or only has them obsolete.

A Possible Solution To This Problem

One possible solution is to replace the NULL value with a value that is as distinct as possible.

In our case, for example, the CUSTOMER_ID can be used. We could cast it to a CHARACTER column, add a “special string’ in front of it, and write it into the Column CURRENCY_CD:

CUSTOMER_IDCUSTOMER_NAMECURRENCY_CD
1 Nina LoweryEUR
2 Alexia NealUSD
3 Kyla Chan#NULL#3
4 Alesha Ferrell#NULL#4
5 Cara Adams#NULL#5
6 Abigail Larsen#NULL#6
7 Amie Massey#NULL#7

However, a disadvantage of this method is that NULL values are no longer recognizable. So you might need some logic in the queries.

Do you want to know everything about joins in Teradata? Then read this article:

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>