Teradata Join Strategies: How to Optimize Join Operations

Roland Wenzlofsky

May 3, 2023

minutes reading time

Introduction

Teradata offers several methods for conducting joins, but all necessitate one prerequisite.

The paired table rows must reside on identical AMPs.

The chosen method for joining and relocating data is called a join strategy.

The preparation for each join method varies. The choice of Teradata join strategy utilized by the Optimizer is determined by the cost evaluation of each strategy.

Cost factors for Teradata include number of table rows, required data copying, and sorting. The Optimizer relies on accurate statistics to properly capture data demographics.

How Teradata Optimizes the Join Operations

Teradata minimizes spool usage by employing efficient join techniques that reduce the amount of copied data.

Copy only the columns required by the request
Apply the WHERE conditions before copying rows to a common AMP
Whenever possible, spool the smaller table

Here is a basic illustration of calculating the necessary spool.

Table “Customer” has 1000 rows, each needing 100 bytes.
Table “Sale” has 10000 rows, each needing 500 bytes.

SELECT t01.*,t02.SalesId 
FROM Customer t01 INNER JOIN Sales t02 
ON t01.CustomerId = t02.CustomerId;

Selecting all columns of the “Customer” table requires 100,000 bytes of spool space (1000 x 100).

We select only the “SalesId” column from the “Sales” table, assuming it is an 8-byte INTEGER data type, which would require 80,000 bytes of spool space.

The Optimizer uses the calculated spool space to determine the distribution of tables to a common AMP.

How You Can Optimize the Join Operations

Collect statistics on all joined columns
Don’t apply functions on the join columns
Always do equality joins to avoid a product join

This article will explore frequently employed join techniques and their associated strategies.

teradata join — Join Methods

1. Teradata Merge Join – The Swiss Army Knife

When will it be used?

The Optimizer can only use it for equality joins

Preconditions

The rows to be joined have to be on a common AMP
Teradata needs to sort both spools by the row hash calculated over the join column(s)

Merge Joins Algorithms

Operation

Slow Path (The Optimizer uses an index on the left table)

The ROWHASH of each qualifying row in the left spool is used to lookup matching rows with identical ROWHASH in the right spool (using a binary search as both spools are sorted by row hash)

teradata merge join

Fast Path (The Optimizer uses no index)

No Index is used. The tables are fully scanned, alternating from the left to the right side; the row hash match is done.

teradata fast path

The Non-PPI Merge Join Strategies

Possible Join Preparation

teradata merge join

Re-Distribution of one or both spools by ROWHASH or
Duplication of the smaller spool to all AMPs
Sorting of one or both spools by the ROWID

The join columns define the common AMP of rows from two spools the Optimizer wants to join. This leaves us with 4 data distribution scenarios:

The Primary Indexes (or any other suitable index) of both tables equal the join columns: No join preparation is needed as the rows to be joined are already on the common AMP and sorted by the row hash.

Hint: The Primary Index of both tables needs to include all join columns, but it can have additional columns. These are applied as residual conditions. It’s only crucial that
Only one table’s primary index (or any other suitable index) matches the join columns: The second table’s rows must be relocated to the common AMP.
Neither the Primary Index of the first table (or any other suitable index) nor the Primary Index (or any other appropriate index) of the second table matches the joined columns: The rows of both tables must be relocated to the common AMP.

To relocate rows to the shared AMP, you can redistribute them using the ROWHASH join column(s), or duplicate the smaller table across all AMPs.

If the Optimizer chooses to replicate the smaller table to all AMPs, it will then be sorted in the subsequent step by computing the rowhash of the joined columns and arranging them in order.

If the primary index does not match the joined columns, the bigger table must also undergo hashing and sorting. Subsequently, the bigger table is spooled locally to the AMP.

Performance Considerations

This join’s efficiency relies on identical primary indexes in both tables, which eliminates the need for Teradata to transfer any rows between AMPs.

The join method is highly robust in various aspects, as any performance loss is limited. This occurs in cases where the optimizer has underestimated the number of rows in tables due to incorrect or missing statistics.

Additionally, as both tables are sorted by rowhash, Teradata only needs to copy each data block into the main memory once, providing an extra benefit.

The PPI Merge Join Strategies

The Optimizer can utilize all methods, even when joining involves PPI tables.

If a non-partitioned table is joined with a PPI table, the Optimizer can transform the PPI table into a non-partitioned table and apply one of the strategies.

The Optimizer can partition a non-partitioned table like a PPI table when joining with an un-partitioned table.

This is only feasible if the non-partitioned table contains all partition columns.

The Optimizer can utilize unique merge join methods if there is one partitioned table.

Rowkey-Based Merge Join
Single Window Merge Join
Sliding Window Merge Join

The Rowkey-Based Merge Join

The Rowkey-Based Merge Join can be direct; neither of the two tables involved is spooled.

To perform a rowkey-based merge join, it is necessary for the tables to share identical primary indices and partitioning. Additionally, the join must be an equijoin conducted through the primary index and partition columns.

If these requirements are not met, the Optimizer may still opt for a merge join based on the rowkey, but it will require preparation for the join rather than a direct join.

In a PPI table join with an unpartitioned table, the optimizer can rearrange the unpartitioned table to match the primary index and partitions of the PPI table to execute a rowkey-based merge join.

The Rowkey-Based Merge Join and CHARACTER SETS

When joining two PPI tables that use Character Partitioning, it is essential to ensure that the CHARACTER SET of the partition columns is identical. Otherwise, the Rowkey-Based Merge Join cannot be direct, but the table with CHARACTER SET LATIN must first be spooled to make the row keys of both tables identical.

Here is an example. Both tables are identical, except for the CHARACTER SET of Column PartCol1.

Explain SELECT *
FROM MergeJoinChar1 t01
INNER JOIN
MergeJoinChar2 t02
ON
 t01.PK = t02.PK
 AND t01.PartCol1 = t02.PartCol1;

  1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
     in all partitions to prevent global deadlock.
  3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
     TD_MAP1 for read.
  <strong>4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by way
     of an all-rows scan with no residual conditions into Spool 2
     (all_amps), which is built locally on the AMPs.  Then we do a SORT
     to partition Spool 2 by rowkey.  The size of Spool 2 is estimated
     with low confidence to be 73,420 rows (3,450,740 bytes).  The
     estimated time for this step is 0.29 seconds.</strong>
  <strong>5) We do an all-AMPs JOIN step in TD_MAP1 from DWHPRO.t02 by way of a
     RowHash match scan with no residual conditions, which is joined to
     Spool 2 (Last Use) by way of a RowHash match scan.  DWHPRO.t02 and
     Spool 2 are joined using a rowkey-based merge join, with a join
     condition of ("((TRANSLATE((PartCol1 )USING LATIN_TO_UNICODE))=
     DWHPRO.t02.PartCol1) AND (PK = DWHPRO.t02.PK)").  The result goes
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with low confidence to be 97,878
     rows (4,796,022 bytes).  The estimated time for this step is 0.25
     seconds.</strong>
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -&gt; The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.53 seconds.

CREATE MULTISET   TABLE MergeJoinChar2
(
	PK INTEGER NOT NULL,
	PartCol1 CHAR(10) CHARACTER SET LATIN

) PRIMARY INDEX (PK)
PARTITION BY  RANGE_N(PartCol1 BETWEEN 'A','B','C','D' AND 'Z')
;
CREATE MULTISET   TABLE MergeJoinChar2
(
	PK INTEGER NOT NULL,
	PartCol1 CHAR(10) CHARACTER SET UNICODE

) PRIMARY INDEX (PK)
PARTITION BY  RANGE_N(PartCol1 BETWEEN 'A','B','C','D' AND 'Z')
;

An Example of a Rowkey-Based Merge Join

The join, in this instance, utilizes the Primary Index (PK) and Partition Columns (PartCol1) that are identical in both tables. Teradata executes a direct rowkey-based merge join, eliminating the need to spool tables.

Explain SELECT *
FROM MergeJoin1 t01
INNER JOIN
MergeJoin2 t02
ON
 <strong>t01.PK = t02.PK
 AND t01.PartCol1 = t02.PartCol1;</strong>

  1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
     in all partitions to prevent global deadlock.
  3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
     TD_MAP1 for read.
 <strong> 4) We do an all-AMPs JOIN step in TD_MAP1 from DWHPRO.t02 by way of a
     RowHash match scan with no residual conditions, which is joined to
     DWHPRO.t01 by way of a RowHash match scan with no residual
     conditions.  DWHPRO.t02 and DWHPRO.t01 are joined using a
     rowkey-based merge join, with a join condition of (
     "(DWHPRO.t01.PartCol1 = DWHPRO.t02.PartCol1) AND (DWHPRO.t01.PK =
     DWHPRO.t02.PK)").</strong>  The result goes into Spool 1 (group_amps),
     which is built locally on the AMPs.  The size of Spool 1 is
     estimated with low confidence to be 73,414 rows (2,716,318 bytes).
     The estimated time for this step is 0.15 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -&gt; The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.15 seconds.

An Example of changing both PPI Tables to Non-PPI Tables

The join, in this instance, is executed on the primary index, which is identical in both tables, and the partition columns are excluded from the join condition. The tables are of equivalent size.

The Optimizer modifies PPI and non-PPI tables by spooling and sorting them according to the rowhash of their primary index before executing a conventional rowhash match scan.

Explain SELECT *
 FROM MergeJoin1 t01
 INNER JOIN
 MergeJoin2 t02
 ON
     <strong>t01.PK = t02.PK;</strong>
 1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
      RowHash in all partitions to prevent global deadlock.
   2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
      in all partitions to prevent global deadlock.
   3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
      TD_MAP1 for read.
  <strong> 4) We execute the following steps in parallel.</strong>
        1) <strong>We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t02 by
           way of an all-rows scan with no residual conditions into
           Spool 2 (all_amps), which is built locally on the AMPs.  Then
           we do a SORT to order Spool 2 by the hash code of (
           DWHPRO.t02.PK).  The size of Spool 2 is estimated with high
           confidence to be 73,414 rows (1,541,694 bytes).  The
           estimated time for this step is 0.27 seconds.</strong>
        2) <strong>We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by
           way of an all-rows scan with no residual conditions into
           Spool 3 (all_amps), which is built locally on the AMPs.  Then
           we do a SORT to order Spool 3 by the hash code of (
           DWHPRO.t01.PK).  The size of Spool 3 is estimated with high
           confidence to be 73,414 rows (1,541,694 bytes).  The
           estimated time for this step is 0.27 seconds.</strong>
   5) <strong>We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of a RowHash match scan, which is joined to Spool 3 (Last Use)
      by way of a RowHash match scan.  Spool 2 and Spool 3 are joined
      using a merge join, with a join condition of ("PK = PK").  The
      result goes into Spool 1 (group_amps), which is built locally on
      the AMPs.  The size of Spool 1 is estimated with low confidence to
      be 73,414 rows (2,716,318 bytes).  The estimated time for this
      step is 0.17 seconds.</strong>
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -&gt; The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.44 seconds.

The Sliding Window Merge Join

The Sliding Window Merge Join is utilized to join a PPI table with:

An unpartitioned table
A PPI table that is not identically partitioned

This post provides a comprehensive explanation of the sliding window merge join.

Teradata Sliding Window Merge Join: A Detailed Explanation

An Example of a Teradata Sliding Window Merge Join

In this instance, the join solely employs the identical primary index in both tables. The sole contrast from the previous arrangement is that the second table’s magnitude is twice as large.

The Optimizer converts the smaller tables into a non-PPI table and performs a sliding window merge join.

Explain SELECT *
 FROM MergeJoin1 t01
 INNER JOIN
 MergeJoin2 t02
 ON
     t01.PK = t02.PK;
 1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
      RowHash in all partitions to prevent global deadlock.
   2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
      in all partitions to prevent global deadlock.
   3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
      TD_MAP1 for read.
   <strong>4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by way
      of an all-rows scan with no residual conditions into Spool 2
      (all_amps), which is built locally on the AMPs.  Then we do a SORT
      to order Spool 2 by the hash code of (DWHPRO.t01.PK).  The size of
      Spool 2 is estimated with high confidence to be 73,414 rows (
      1,541,694 bytes).  The estimated time for this step is 0.27
      seconds.</strong>
   <strong>5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of a RowHash match scan, which is joined to DWHPRO.t02 by way
      of a RowHash match scan with no residual conditions.  Spool 2 and
      DWHPRO.t02 are joined using a sliding-window merge join, with a
      join condition of ("PK = DWHPRO.t02.PK").  The result goes into
      Spool 1 (group_amps), which is built locally on the AMPs.  The
      size of Spool 1 is estimated with low confidence to be 146,828
      rows (5,432,636 bytes).  The estimated time for this step is 0.78
      seconds.</strong>
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -&gt; The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 1.05 seconds.<strong>
</strong>

This is an optimized query that utilized a sliding window merge join and achieved success.

How to Optimize Joins in Teradata: A Case Study

2. Teradata Hash Join – The Sprinter, but only if executed in memory

When will it be used?

The Optimizer can only use it for equality joins

Preconditions

The rows to be joined have to be on a common AMP
The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache.
The larger spool stays unsorted

Operation

The larger spool is a full table scanned row by row
Each ROWID from the bigger spools is searched in, the smaller spool (with a binary search)

Possible Join Preparation

Re-Distribution of the smaller spool by ROWHASH or
Duplication of the smaller spool to all AMPs
Sorting of the smaller spool

Performance Considerations

Hash joins perform better than merge joins when meeting specific requirements.

teradata hash join

3. Teradata Nested Join – The Fastest but scarce

Preconditions

Spool 1 allows a unique ROWHASH access (a unique index is defined)
Spool 2 allows any ROWHASH access (a unique or not unique index is defined)
Nested joins must have equality join condition

Operation

The qualifying row of spool one is accessed using any unique index (UPI, USI). Rarely can it be a NUPI, only if the Optimizer knows it returns one row.
The row is relocated to the AMPs owning these rows of spool 2. A Nested Join uses a few AMPs if the index of Spool 2 is a UPI, NUPI, or USI. It used all-AMPs if the index of Spool 2 is a NUSI.
Spool 2 is full table scanned, and each row is combined with the one row from Spool 1

Possible Join Preparation

None

Performance Considerations

The nested join is highly efficient as it only requires one AMP, thus making it ideal for tactical workloads.

Interested in optimizing tactical workload? Keep reading here:

Optimizing the Tactical Workload in Teradata: Skills and Tools Needed

If you require in-depth knowledge on nested join, this article is tailored to your needs.

Introduction to Teradata Nested Join: The Cheapest Join Method Using Indexes

4. Teradata Product Join – The disliked Guest

https://youtu.be/05OBzMIJHto

When will it be used?

The join condition is an inequality condition
Join conditions are OR combined
A referenced table is not used in the join condition
The Product join is the cheapest join method

Preconditions

The rows to be joined have to be on the AMP
Neither of the two spools needs to be sorted

Operation

Teradata does a full table scan on the smaller spool and
Each qualifying row of spool one is compared with each row of spool 2

Possible Join Preparation

Re-Distribution of one or both spools by ROWHASH or
Duplication of the smaller spool

Performance Considerations

A product join costs are usually very high since n*m comparisons must be carried out (n = the number of rows in table1, m = number of rows in table2).

In addition to the numerous comparisons, product joins have the drawback of requiring multiple copies of data blocks from memory to main memory if Teradata cannot retain the small table completely in the main memory.

Featured Image without Sidebar

Take a brief survey and receive the book for free Get instant access

Previous Article

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

The 4 Principal Teradata Join Strategies
Hi Roland – Greetings! These concepts are useful. I would appreciate if you could please post or send me few sample queries. I am a Data Scientist/Analyst and I do not get involved at the AMP Level. This is kool. Need your help. Thank you. Regards, Sri

Hello Roland,

I like your articles. They have always described the subject very well and you provide a good parallel with real life.

I’d like to add something to the article:
1) What statement contains some inaccuracy
>> “The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache”
Smaller rowset(spool is a bit incorrect term here) is not actually sorted, but a hash table is built based on hashing of smaller row set( of some part of a smaller row set if the hash table does not fit in memory).

2) >> The bigger spool is full table scanned row by row
Usually, the bigger rowset is full table scanned by cylinder reads and probes row by row

3) >> Sorting of the smaller spools
Sorting is wrong. Building a hash table under a smaller row set

4) It will be good to add to the requirements for the hash, what it can happen on equality join.

5)It may be added what if the hash table does not fit in FSG cache then the smaller table will be divided into some smaller parts, named Fanout, and the hash table will be built for every Fanout and the larger row set will be probed for every Fanout. That fact may multiply count of full table scans depends on Fanout count, which in turn may dramatically reduce performance.

6) It will be good to add to the requirements for nested loops, what can happen on equality join.

7) I would also suggest changing requirements for nested join according to documentation:
“There is a join on a column of the row specified by the first table to any primary index or
USI of the second table. In rare cases, the index on the second table can be a NUSI. ”

8) Small notice that a merge join may contain a product join inside.
I make an example in my comment to https://www.dwhpro.com/teradata-merge-join-vs-product-join/ article.

Thank you!

Best regards,
Aleksei Svitin.

You might also like

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes