Teradata Joins – Reduce Skewing with RANDOM()

Roland Wenzlofsky

August 26, 2015

minutes reading time

Sometimes natural skew can become a huge problem, causing bad join performance. The Partial Duplication & Partial Redistribution (PDPR)  feature on Teradata 14 (and above) helps reduce this issue. Still, it will not always be able to detect all possible applications (for example, if statistics are not revealing the skewed values).

If you are stuck on a version below Teradata 14, you will not have PDPR. A technical trick comes into play: Skew reduction with the RANDOM() function.

Teradata Joins are vulnerable to Skewing.

Let’s carve out the details step by step.

We deal with a join between the two tables below in our example. There are some known facts about these tables:

  1. Table 1 doesn’t contain any row with ‘A’ in the join column. Furthermore, it’s ensured that there will never be value ‘A’ added to this column
  2. The join column of Table 2 contains many rows with the value ‘A’. It’s skewed (our example tables only include a few rows, but that’s enough to get the idea)
  3. Both tables are evenly distributed across all AMPs on their primary index column (called “primary key” in our example)
  4. There exists a range of join column values which never is available in the regular data of Table 2 (we will explain later why this is a must)

Above listed facts is a requirement for our test setup to work.

When left joining table 1 with table 2 on the join column, rows of both tables have to be rehashed on this column. We assume in our example that Teradata redistributes both tables.

The alert reader knows about the Teradata architecture and will recognize that Teradata will move all rows with the value ‘A’ in the join column to the same AMP. The spool of table 2 will be heavily skewed. On the other hand, will the redistribution of table 1 most likely result in an evenly distributed spool? During the joining step, the AMP holding table 2 rows with value ‘A’ will have to do much more work than the other AMPs.

How can we avoid skewing in this case? The RANDOM() function can help us:

We know that table 1 is not containing any rows with the value ‘A’ in the join column (and never will, by definition). This information allows us to conclude that we will have no matches for any table 2 row containing ‘A’.

We can easily replace ‘A’ in table 2 with a range of values not available as regular data (see requirement four above). Here comes the RANDOM() function into play.

We can generate a range of values, such as TRIM(CAST(RANDOM(1,100) AS VARCHAR(3))), which are replacing the ‘A’ values of Table 2. As these values are non-existent in regular data, this will not have any impact on the result set:

SELECT PrimaryKey, TRIM(CAST(RANDOM(1,100) AS VARCHAR(3))) AS JoinColumn FROM Table2
) PRIMARY INDEX (PrimaryKey);

Instead of joining Table2 with Table1, we will join NOT_SKEWING_TABLE2 with table1. The redistribution of table NOT_SKEWING_TABLE2 will not cause a skewed spool as the randomly generated values will spread evenly across all AMPs.

As a result, we will have an evenly distributed workload across all AMPs; the join will perform much better than without this trick. I improved the run time of a query from 4 hours to 3 minutes by applying the above method.

  • Avatar
    sobhan bhaduri says:

    Select ……………… from table1
    left outer join table2
    case when table2.Joincolumn = ‘A’ then cast(random(1,100) as varchar(3))
    else table1.Joincolumn = table2.Joincolumn end:

  • Thanks for the detailed explanation. You are absolutely right, a more useful application of the RANDOM() function is your last example. Thanks again. Roland.

  • Avatar
    Siddesh Pawar says:

    Instead of using the Volatile table, the quick way to get rid of skew value is to add a filter condition in the join clause as “Table2.JoinColumn ‘A’ “, since Table 2 is an inner table only matching records will be projected from this table.

    This optimization usually applies when the outer table has skewed values in join columns. Example:

    Select ……………… from table1 left outer join table2 on
    table1.Joincolumn = table2.Joincolumn;

    Let’s consider the table1 join column has a particular skewed value like ‘A’ which does not exist in table2. We cannot filter out this value as table1 is an outer table and all matching and non-matching values need to be projected.

    We can solve this skew issue using below random() function technique,

    Select ……………… from table1 left outer join table2 on
    case when table1.Joincolumn = ‘A’ then cast(random(1,100) as varchar(3)) else table1.Joincolumn = table2.Joincolumn;

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like