Tag Archive

Tag Archives for " hash join "
7

The 4 Principal Teradata Join Strategies

Teradata has various join methods to choose from. Each of these methods places different demands on data distribution. But they all have one thing in common: the rows of both tables must be available for joining on a common AMP.

It is therefore usually necessary to copy the rows of one or both tables. The join method together with the selected type of copying the data is called a Teradata join strategy.

When analyzing the execution plan of your queries, you will mainly see one of the join methods described below.

This article analyzes how every join method works and highlights the advantages and disadvantages.

Data demography has a great influence on the execution plan and the chosen join strategy.

Requirements:

  • The rows to be joined have to be on a common AMP
  • Both spools must be sorted by the ROWID calculated over the join column(s)

Process:

  • The ROWHASH of each qualifying row in the left spool is used to lookup matching rows with identical ROWHASH in the right spool (with a binary search as both spools are sorted by ROWID)

Possible Join Preparations required:

  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of one or both spools by the ROWID

1. Teradata Merge Join – The Swiss Army Knife

teradata merge join

The common AMP of rows from two spools being joined is defined by the join columns. This leaves us with 3 data distribution scenarios:

  1. The Primary Indexes (or any other suitable index) of both tables equals the join columns: No join preparation is needed as the rows to be joined are already on the common AMP

2. Only the Primary Index (or any other suitable index)  of one table matches the join columns: The rows of the second table have to be relocated to the common AMP

3. Neither the Primary Index of the first table (or any other suitable index) nor the Primary Index (or any other suitable index)  of the second table matches the join columns: The rows of both tables have to be relocated to the common AMP

Relocation of rows to the common AMP can be done by redistribution of the rows by the join column(s) ROWHASH or by copying the smaller table to all AMPs.

2. Teradata Hash Join – The Sprinter, but only if executed in FSG Cache

Requirements:

  • The rows to be joined have to be on a common AMP
  • The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache
  • The bigger spool stays unsorted

Process:

  • The bigger spool is full table scanned row by row
  • Each ROWID from the bigger spools is searched in the smaller spool (with a binary search)

Possible Join Preparations required:

  • Re-Distribution of the smaller spool by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of the smaller spools
teradata hash join

3. Teradata Nested Join – The Fastest, but scarce

Requirements:

  • Spool 1 allows a unique ROWHASH access (a unique index is defined)
  • Spool 2 allows any kind of ROWHASH access (a unique or not unique is index defined)
  • Nested joins must have an equality join condition

Process:

  • The qualifying row of spool 1 is accessed by the usage of any unique index (UPI, USI). Rarely it can be a NUPI if the optimizer knows that only one row is returned.
  • The row is relocated to the AMPs owning these rows of spool 2.
    A Nested Join is using a few AMPs if the index of Spool 2 is a UPI, NUPI, or USI. It used all-AMPs if the index of Spool 2 is a NUSI.
  • Spool 2 is full table scanned and each row is combined with the one row from Spool 1

Possible Join Preparations required:

  • None

4. Teradata Product Join – The disliked Guest

Requirements:

  • The rows to be joined have to be on the AMP
  • No spool needs to be sorted!

Process:

  • A full table scan is done on the smaller spool and
  • Each qualifying row of spool 1 is compared against each row of spool 2

Possible Join Preparations required:

  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool
Teradata Quiz
Teradata Quiz
Developer: Roland Wenzlofsky
Price: Free

>