How To Reduce Skew In Teradata Joins Using The RANDOM() Function

Natural skew may hinder join performance, but the PDPR capability in Teradata 14 can alleviate this problem. However, PDPR may not detect all potential applications if skewed values are not evident in the statistics.

If your Teradata version is below 14, PDPR is unavailable. To address this, you can use a technique based on the RANDOM() function to reduce skew.

Teradata Joins are vulnerable to Skewing.

Let’s walk through the specifics step by step.

Our example involves a join between the two tables shown below. Certain facts are already established about these tables:

Table 1 contains no rows with ‘A’ in the join column. Furthermore, it’s ensured that the value ‘A’ will never be added to this column.
The join column of Table 2 contains many rows with the value ‘A’. It’s skewed (our example tables only include a few rows, but that’s enough to get the idea).
Both tables are evenly distributed across all AMPs on their primary index column (called “primary key” in our example).
There exists a range of join column values that are never present in the regular data of Table 2 (we will explain later why this is a requirement).

The above facts are necessary for our testing framework to operate.

By performing a left join of Table 1 and Table 2 on the common column, both tables’ rows are hashed on this column. Assuming Teradata redistributes both tables, this is exemplified.

The informed reader understands Teradata’s architecture and will recognize that all rows containing ‘A’ in the join column will be relocated to a single AMP, causing Table 2’s spool to be imbalanced. Conversely, the redistribution of Table 1 should yield an evenly distributed spool. When joining, the AMP holding Table 2’s rows with ‘A’ will bear a greater workload than the other AMPs.

We can use the RANDOM() function to prevent skewing in this scenario.

Table 1 does not contain any rows with the value ‘A’ in the join column, meaning there will be no matches for any row in Table 2 containing ‘A’.

We can easily replace ‘A’ in Table 2 with a range of values unavailable as regular data (see requirement four above). This is where the RANDOM() function comes into play.

We can generate a range of values, such as TRIM(CAST(RANDOM(1,100) AS VARCHAR(3))), which replace the ‘A’ values of Table 2. As these values are non-existent in regular data, this will not affect the result set:

CREATE VOLATILE TABLE NOT_SKEWING_TABLE2
AS
(
SELECT PrimaryKey, TRIM(CAST(RANDOM(1,100) AS VARCHAR(3))) AS JoinColumn FROM Table2
) PRIMARY INDEX (PrimaryKey);

We will join NOT_SKEWING_TABLE2 with Table 1 instead of Table 2. Redistributing NOT_SKEWING_TABLE2 will not result in a skewed spool because the randomly generated values will be evenly distributed across all AMPs.

By implementing this technique, our workload will be uniformly distributed among all AMPs, resulting in improved performance for the join operation. Applying this approach reduced the query’s runtime from 4 hours to just 3 minutes.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

3 thoughts on “How to Reduce Skew in Teradata Joins Using the RANDOM() Function”

Select ……………… from table1
left outer join table2
on
case when table2.Joincolumn = ‘A’ then cast(random(1,100) as varchar(3))
else table1.Joincolumn = table2.Joincolumn end:

Thanks for the detailed explanation. You are absolutely right, a more useful application of the RANDOM() function is your last example. Thanks again. Roland.

Instead of using the Volatile table, the quick way to get rid of skew value is to add a filter condition in the join clause as “Table2.JoinColumn ‘A’ “, since Table 2 is an inner table only matching records will be projected from this table.

This optimization usually applies when the outer table has skewed values in join columns. Example:

Select ……………… from table1 left outer join table2 on
table1.Joincolumn = table2.Joincolumn;

Let’s consider the table1 join column has a particular skewed value like ‘A’ which does not exist in table2. We cannot filter out this value as table1 is an outer table and all matching and non-matching values need to be projected.

We can solve this skew issue using below random() function technique,

Select ……………… from table1 left outer join table2 on
case when table1.Joincolumn = ‘A’ then cast(random(1,100) as varchar(3)) else table1.Joincolumn = table2.Joincolumn;

How to Reduce Skew in Teradata Joins Using the RANDOM() Function

Teradata Joins are vulnerable to Skewing.

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

3 thoughts on “How to Reduce Skew in Teradata Joins Using the RANDOM() Function”

Leave a Comment Cancel reply