Learning The 4 Principal Teradata Join Methods Is Not Difficult At All!

Learning The 4 Principal Teradata Join Methods Is Not Difficult At All! You Just Need A Great Teacher!

teradata secondary index

Introduction

Teradata has several methods available to perform a join. However, they all have one requirement in common:

The rows of two tables that are joined must be located on the same AMP.

The selected join method together with the way in which rows to-be-joined are copied to a common AMP is called a join strategy.

Each join method requires more or less preparation to be performed. Which Teradata join strategy is used depends on how the optimizer evaluates the costs of each join strategy.

Cost factors include the number of table rows, how many rows have to be copied, whether sorting is necessary and much more. As always, it is essential that the optimizer has correct statistics available to capture the data demography correctly.

How Teradata optimizes the Join Operations

Teradata always uses the following methods to minimize the amount of data copied as part of a join (minimizing the spool usage):

  • Copy only the columns required by the request
  • Apply the WHERE conditions before copying rows to a common AMP
  • Whenever possible spool the smaller table

Here is a simple example of how to calculate the required spool.

Table “Customer” has 1000 rows and each row needs 100 bytes.
Table “Sale” has 10000 rows and each row needs 500 bytes.

SELECT t01.*,t02.SalesId 
FROM Customer t01 INNER JOIN Sales t02 
ON t01.CustomerId = t02.CustomerId;

Since we select all columns for table “Customer”, the required spool space is 1000 x 100 = 100,000 bytes.

From the table “Sales” we select only the column “SalesId”. Let us assume that “SalesId” is of the data type INTEGER (8 Bytes), the required spool space is 10,000 x 8 = 80,000 bytes.

The above-calculated spool space forms the basis for the optimizer when selecting the distribution of the tables to a common AMP.

How You Can optimize the Join Operations

In this article, we will learn about the most commonly used join methods and their related join strategies:

teradata join
Join Methods

1. Teradata Merge Join – The Swiss Army Knife

When will it be used?

  • Is only used for equality joins

Preconditions

  • The rows to be joined have to be on a common AMP
  • Both spools must be sorted by the row hash calculated over the join column(s)

Merge Joins Algorithms

Operation

Slow Path (Index is used on the left table)

  • The ROWHASH of each qualifying row in the left spool is used to lookup matching rows with identical ROWHASH in the right spool (with a binary search as both spools are sorted by row hash)
teradata merge join

Fast Path (No index is used)

  • No Index is used, the tables are full table scanned, alternating from the left to the right side the row hash match is done.
teradata fast path

The Non-PPI Merge Join Strategies

Possible Join Preparation

teradata merge join
  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of one or both spools by the ROWID

The common AMP of rows from two spools being joined is defined by the join columns. This leaves us with 4 data distribution scenarios:

  1. The Primary Indexes (or any other suitable index) of both tables equal the join columns: No join preparation is needed as the rows to be joined are already on the common AMP and sorted by the row hash.

    Hint: The Primary Index of both tables needs to include all join columns, but it can have additional columns as well. These are applied as residual conditions. It's only important that
  2. Only the Primary Index (or any other suitable index)  of one table matches the join columns: The rows of the second table have to be relocated to the common AMP
  3. Neither the Primary Index of the first table (or any other suitable index) nor the Primary Index (or any other suitable index)  of the second table matches the join columns: The rows of both tables have to be relocated to the common AMP

Relocation of rows to the common AMP can be done by redistribution of the rows by the join column(s) ROWHASH or by copying the smaller table to all AMPs.

If the smaller table is copied to all AMPs, it must be sorted in the next step (calculating the rowhash of the join columns and sorting).

The larger table must also be hashed and sorted according to the join columns if the primary index does not already correspond to the join columns. In this case, the larger table is spooled AMP-local.

Performance Considerations

If both tables have the same primary index, this join is very efficient, since no rows have to be copied between AMPs.

This join method is also very robust in other respects since the performance loss is bound if, for example, the number of rows in the tables was underestimated due to incorrect or missing statistics.

Another advantage: Since both tables are sorted by rowhash, each data block only has to be copied once into the main memory.

The PPI Merge Join Strategies

All methods mentioned above can be used by the optimizer even if PPI tables are involved in joining.

For example, if a non-partitioned table and a PPI table are joined, the optimizer can convert the PPI table into a non-partitioned table and then use one of the strategies listed above.

On the other hand, the optimizer can also partition the non-partitioned table in the same way as the PPI table when there is a join between a PPI table and an un-partitioned table.

However, this is only possible if all partition columns are present in the non-partitioned table.

If at least one table is partitioned, the following special merge join methods are also available to the optimizer:

The Rowkey-Based Merge Join

The Rowkey-Based Merge Join can be a direct join, that is, neither of the two tables involved is spooled.

A prerequisite for a rowkey-based merge join is that both tables have the same primary index and the same partitioning. Furthermore, the join must be an equijoin across the primary index columns and all partition columns.

If these prerequisites are not fulfilled (and an equijoin is done), the optimizer can still decide on a rowkey-based merge join, but join preparation is required (i.e. no direct join).

For example, in the case of a join between a PPI table and an unpartitioned table, the optimizer can restructure the unpartitioned table to the same primary index and the same partitions as the PPI table and then perform the rowkey-based merge join.

The Rowkey-Based Merge Join and CHARACTER SETS

When joining two PPI tables that use Character Partitioning, it is important to make sure that the CHARACTER SET of the partition columns are identical, otherwise, the Rowkey-Based Merge Join cannot be a direct join, but the table with CHARACTER SET LATIN must first be spooled to make the row keys of both tables identical.

Here is an example. Both tables are identical, except for the CHARACTER SET of the Column PartCol1.

CREATE MULTISET   TABLE MergeJoinChar2
(
	PK INTEGER NOT NULL,
	PartCol1 CHAR(10) CHARACTER SET LATIN

) PRIMARY INDEX (PK)
PARTITION BY  RANGE_N(PartCol1 BETWEEN 'A','B','C','D' AND 'Z')
;
CREATE MULTISET   TABLE MergeJoinChar2
(
	PK INTEGER NOT NULL,
	PartCol1 CHAR(10) CHARACTER SET UNICODE

) PRIMARY INDEX (PK)
PARTITION BY  RANGE_N(PartCol1 BETWEEN 'A','B','C','D' AND 'Z')
;
Explain SELECT *
FROM MergeJoinChar1 t01
INNER JOIN
MergeJoinChar2 t02
ON
 t01.PK = t02.PK
 AND t01.PartCol1 = t02.PartCol1;

  1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
     in all partitions to prevent global deadlock.
  3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
     TD_MAP1 for read.
  4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by way
     of an all-rows scan with no residual conditions into Spool 2
     (all_amps), which is built locally on the AMPs.  Then we do a SORT
     to partition Spool 2 by rowkey.  The size of Spool 2 is estimated
     with low confidence to be 73,420 rows (3,450,740 bytes).  The
     estimated time for this step is 0.29 seconds.
  5) We do an all-AMPs JOIN step in TD_MAP1 from DWHPRO.t02 by way of a
     RowHash match scan with no residual conditions, which is joined to
     Spool 2 (Last Use) by way of a RowHash match scan.  DWHPRO.t02 and
     Spool 2 are joined using a rowkey-based merge join, with a join
     condition of ("((TRANSLATE((PartCol1 )USING LATIN_TO_UNICODE))=
     DWHPRO.t02.PartCol1) AND (PK = DWHPRO.t02.PK)").  The result goes
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with low confidence to be 97,878
     rows (4,796,022 bytes).  The estimated time for this step is 0.25
     seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.53 seconds.

An Example for a Rowkey Based Merge Join

In this example, the join is done using Primary Index (PK) and Partition Columns (PartCol1) which are the same in both tables. A direct rowkey-based merge join (without spooling tables) can be performed:

Explain SELECT *
FROM MergeJoin1 t01
INNER JOIN
MergeJoin2 t02
ON
 t01.PK = t02.PK
 AND t01.PartCol1 = t02.PartCol1;

  1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
     in all partitions to prevent global deadlock.
  3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
     TD_MAP1 for read.
  4) We do an all-AMPs JOIN step in TD_MAP1 from DWHPRO.t02 by way of a
     RowHash match scan with no residual conditions, which is joined to
     DWHPRO.t01 by way of a RowHash match scan with no residual
     conditions.  DWHPRO.t02 and DWHPRO.t01 are joined using a
     rowkey-based merge join, with a join condition of (
     "(DWHPRO.t01.PartCol1 = DWHPRO.t02.PartCol1) AND (DWHPRO.t01.PK =
     DWHPRO.t02.PK)").  The result goes into Spool 1 (group_amps),
     which is built locally on the AMPs.  The size of Spool 1 is
     estimated with low confidence to be 73,414 rows (2,716,318 bytes).
     The estimated time for this step is 0.15 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.15 seconds.

An Example for changing both PPI Tables to Non-PPI Tables

In this example, the join is performed on the primary index which is the same in both tables. The partition columns are not in the join condition. Both tables have the same size.

The optimizer decides to change both PPI tables to non-PPI tables (by spooling and sorting by rowhash of the primary index). A classic rowhash match scan then takes place.

Explain SELECT *
 FROM MergeJoin1 t01
 INNER JOIN
 MergeJoin2 t02
 ON
     t01.PK = t02.PK;
 1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
      RowHash in all partitions to prevent global deadlock.
   2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
      in all partitions to prevent global deadlock.
   3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
      TD_MAP1 for read.
   4) We execute the following steps in parallel.
        1) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t02 by
           way of an all-rows scan with no residual conditions into
           Spool 2 (all_amps), which is built locally on the AMPs.  Then
           we do a SORT to order Spool 2 by the hash code of (
           DWHPRO.t02.PK).  The size of Spool 2 is estimated with high
           confidence to be 73,414 rows (1,541,694 bytes).  The
           estimated time for this step is 0.27 seconds.
        2) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by
           way of an all-rows scan with no residual conditions into
           Spool 3 (all_amps), which is built locally on the AMPs.  Then
           we do a SORT to order Spool 3 by the hash code of (
           DWHPRO.t01.PK).  The size of Spool 3 is estimated with high
           confidence to be 73,414 rows (1,541,694 bytes).  The
           estimated time for this step is 0.27 seconds.
   5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of a RowHash match scan, which is joined to Spool 3 (Last Use)
      by way of a RowHash match scan.  Spool 2 and Spool 3 are joined
      using a merge join, with a join condition of ("PK = PK").  The
      result goes into Spool 1 (group_amps), which is built locally on
      the AMPs.  The size of Spool 1 is estimated with low confidence to
      be 73,414 rows (2,716,318 bytes).  The estimated time for this
      step is 0.17 seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.44 seconds.

The Sliding Window Merge Join

The Sliding Window Merge Join is used to join a PPI table with:

  • An unpartitioned table
  • A PPI table that is not identically partitioned

The Sliding Window Merge Join is described in detail in this post:

An Example for a Teradata Sliding Window Merge Join

In this example, the join is only performed using the primary index, which is the same in both tables. The only difference to the previous setup is, that the size of the second table is two times bigger.

This time the optimizer turns the smaller of the two tables into a non-PPI table and executes a sliding window merge join.

Explain SELECT *
 FROM MergeJoin1 t01
 INNER JOIN
 MergeJoin2 t02
 ON
     t01.PK = t02.PK;
 1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
      RowHash in all partitions to prevent global deadlock.
   2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
      in all partitions to prevent global deadlock.
   3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
      TD_MAP1 for read.
   4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by way
      of an all-rows scan with no residual conditions into Spool 2
      (all_amps), which is built locally on the AMPs.  Then we do a SORT
      to order Spool 2 by the hash code of (DWHPRO.t01.PK).  The size of
      Spool 2 is estimated with high confidence to be 73,414 rows (
      1,541,694 bytes).  The estimated time for this step is 0.27
      seconds.
   5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of a RowHash match scan, which is joined to DWHPRO.t02 by way
      of a RowHash match scan with no residual conditions.  Spool 2 and
      DWHPRO.t02 are joined using a sliding-window merge join, with a
      join condition of ("PK = DWHPRO.t02.PK").  The result goes into
      Spool 1 (group_amps), which is built locally on the AMPs.  The
      size of Spool 1 is estimated with low confidence to be 146,828
      rows (5,432,636 bytes).  The estimated time for this step is 0.78
      seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 1.05 seconds.

2. Teradata Hash Join – The Sprinter, but only if executed in Memory

When will it be used?

  • Only can be used for equality joins

Preconditions

  • The rows to be joined have to be on a common AMP
  • The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache
  • The larger spool stays unsorted

Operation

  • The larger spool is full table scanned row by row
  • Each ROWID from the bigger spools is searched in the smaller spool (with a binary search)

Possible Join Preparation

  • Re-Distribution of the smaller spool by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of the smaller spool

Performance Considerations

Hash joins perform better than merge joins if certain requirements are met.

teradata hash join

3. Teradata Nested Join – The Fastest, but scarce

Preconditions

  • Spool 1 allows a unique ROWHASH access (a unique index is defined)
  • Spool 2 allows any kind of ROWHASH access (a unique or not unique is index defined)
  • Nested joins must have an equality join condition

Operation

  • The qualifying row of spool 1 is accessed by the usage of any unique index (UPI, USI). Rarely it can be a NUPI if the optimizer knows that only one row is returned.
  • The row is relocated to the AMPs owning these rows of spool 2.A Nested Join is using a few AMPs if the index of Spool 2 is a UPI, NUPI, or USI. It used all-AMPs if the index of Spool 2 is a NUSI.
  • Spool 2 is full table scanned and each row is combined with the one row from Spool 1

Possible Join Preparation

  • None

Performance Considerations

The nested join is the most efficient join because only one AMP is needed. It will be used mainly for tactical workloads,

4. Teradata Product Join – The disliked Guest

When will it be used?

  • The join condition is an inequality condition
  • Join conditions are OR combined
  • A referenced table is not used in the join condition
  • The Product Join is the cheapest join method

Preconditions

  • The rows to be joined have to be on the AMP
  • Neither of the two spools needs to be sorted

Operation

  • A full table scan is done on the smaller spool and
  • Each qualifying row of spool 1 is compared with each row of spool 2

Possible Join Preparation

  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool

Performance Considerations

The costs of a product join are usually very high since n*m comparisons must be carried out (n = number of rows in table1, m = number of rows in table2).

Another disadvantage of product joins, apart from a large number of comparisons, is that data blocks must be copied several times from memory to main memory if the small table cannot be kept completely in main memory.

Teradata Quiz
Teradata Quiz
Developer: Roland Wenzlofsky
Price: Free

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Avatar Aleksei Svitin says:

    Hello Roland,

    I like your articles. They are always described the subject very well and you provide a good parallels with real life.

    I’d like to add something to the article:
    1) What statement contains some inaccuracy
    >> “The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache”
    Smaller rowset(spool is a bit incorrect term here) is not actually sorted, but a hash table is built based on hashing of smaller row set( of some part of smaller row set if the hash table does not fit in memory).

    2) >>The bigger spool is full table scanned row by row
    Usually the bigger rowset is full table scanned by cylinder reads and probes row by row

    3) >>Sorting of the smaller spools
    Sorting is wrong. Building a hash table under a smaller row set

    4) It will be good to add into the requirements for hash, what it can be happen on equality join.

    5)It may be added what if the hash table does not fit in FSG cache then the smaller table will be divided into some smaller parts, named Fanout, and the hash table will be built for every Fanout and the larger row set will be probed for every Fanout. That fact may multiply count of full table scans depends on Fanout count, what in turn may dramatically reduce performance.

    6) It will be good to add into the requirements for nested loops, what it can be happen on equality join.

    7) I would also suggest to change requirements for nested join according documentation:
    “There is a join on a column of the row specified by the first table to any primary index or
    USI of the second table. In rare cases, the index on the second table can be a NUSI. ”

    8) Small notice that a merge join may contain a product join inside.
    I make an example in my comment to https://www.dwhpro.com/teradata-merge-join-vs-product-join/ article.

    Thank you!

    Best regards,
    Aleksei Svitin.

  • Avatar Sri Ramanujam says:

    The 4 Principal Teradata Join Strategies
    Hi Roland – Greetings! These concepts are useful. I would appreciate if you could please post or send me few sample queries. I am a Data Scientist/Analyst and I do not get involved at the AMP Level. This is kool. Need your help. Thank you. Regards, Sri

  • >