teradata join strategies

When analyzing the execution plan of your queries, you will mainly see one of below described join strategies used. This post shows how each of them works and carves out advantages and disadvantages of each join strategy.

Data demographics heavily influence the execution plan and the selected join strategy.

 

1. Teradata Merge Join – The Swiss Army Knife

Requirements:

  • The rows to be joined have to be on a common AMP
  • Both spools must be sorted by the ROWID calculated over the join column(s)

Process:

  • The ROWHASH of each qualifying row in the left spool is used to look up matching rows with identical ROWHASH in the right spool (with a binary search as both spools are sorted by ROWID)

Possible Join Preparations required:

  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of one or both spools by the ROWID

teradata merge join

The common AMP of rows from two spools being joined is defined by the join columns. This leaves us with 3 data distribution scenarios:

1.The Primary Indexes (or any other suitable index) of both tables equals the join columns: No join preparation is needed as the rows to be joined are already on the common AMP

2.Only the Primary Index (or any other suitable index)  of one table matches the join columns: The rows of the second table have to be relocated to the common AMP

3.Neither the Primary Index of the first table (or any other suitable index) nor the Primary Index (or any other suitable index)  of the second table matches the join columns: The rows of both tables have to be relocated to the common AMP

Relocation of rows to the common AMP can be done by redistribution of the rows by the join column(s) ROWHASH or by copying the smaller table to all AMPs.

2. Teradata Hash Join – The Sprinter, but only if executed in FSG Cache

Requirements:

  • The rows to be joined have to be on a common AMP
  • The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache
  • The bigger spool stays unsorted

Process:

  • The bigger spool is full table scanned row by row
  • Each ROWID from the bigger spools is searched in the smaller spool (with a binary search)

Possible Join Preparations required:

  • Re-Distribution of the smaller spool by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of the smaller spools

hash_join

3. Teradata Nested Join – The Fastest, but scarce

Requirements:

  • Spool 1 allows a unique ROWHASH access (a unique index is defined)
  • Spool 2 allows any kind of ROWHASH access (a unique or not unique is index defined)

Process:

  • The qualifying row of spool 1 is accessed by usage of any unique index.
  • The row is relocated to the AMP owning the rows of spool 2
  • Spool 2 is full table scanned and each row is combined with the one row from Spool 1

Possible Join Preparations required:

  • None

4. Teradata Product Join – The disliked Guest

Requirements:

  • The rows to be joined have to be on the AMP
  • No spool needs to be sorted!

Process:

  • A full table scan is done on the smaller spool and
  • Each qualifying row of spool 1 is compared against each row of spool 2

Possible Join Preparations required:

  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool

product_join

 

 

Our Reader Score
[Total: 60    Average: 4.3/5]
The 4 Principal Teradata Join Strategies written by Roland Wenzlofsky average rating 4.3/5 - 60 user ratings

5 COMMENTS

  1. Roland,

    I really like the way you have described this very important topic. It is one of the best explanations I have seen. Thank you for this brilliant explanation. You have covered the most important topic and all who learn should really appreciate your guidance.

    Thank you again,

    Tom Coffing

LEAVE A REPLY

Please enter your comment!
Please enter your name here