Joins Archives - DWHPRO

Archive

Category Archives for "Joins"
7

Learning The 4 Principal Teradata Join Methods Is Not Difficult At All! You Just Need A Great Teacher!

Introduction

Teradata has several methods available to perform a join. However, they all have one requirement in common:

The rows of two tables that are joined must be located on the same AMP.

The selected join method together with the way in which rows to-be-joined are copied to a common AMP is called a join strategy.

Each join method requires more or less preparation to be performed. Which Teradata join strategy is used depends on how the optimizer evaluates the costs of each join strategy.

Cost factors include the number of table rows, how many rows have to be copied, whether sorting is necessary and much more. As always, it is essential that the optimizer has correct statistics available to capture the data demography correctly.

How Teradata optimizes the Join Operations

Teradata always uses the following methods to minimize the amount of data copied as part of a join (minimizing the spool usage):

  • Copy only the columns required by the request
  • Apply the WHERE conditions before copying rows to a common AMP
  • Whenever possible spool the smaller table

Here is a simple example of how to calculate the required spool.

Table “Customer” has 1000 rows and each row needs 100 bytes.
Table “Sale” has 10000 rows and each row needs 500 bytes.

SELECT t01.*,t02.SalesId 
FROM Customer t01 INNER JOIN Sales t02 
ON t01.CustomerId = t02.CustomerId;

Since we select all columns for table “Customer”, the required spool space is 1000 x 100 = 100,000 bytes.

From the table “Sales” we select only the column “SalesId”. Let us assume that “SalesId” is of the data type INTEGER (8 Bytes), the required spool space is 10,000 x 8 = 80,000 bytes.

The above-calculated spool space forms the basis for the optimizer when selecting the distribution of the tables to a common AMP.

How You Can optimize the Join Operations

In this article, we will learn about the most commonly used join methods and their related join strategies:

teradata join
Join Methods

1. Teradata Merge Join – The Swiss Army Knife

When will it be used?

  • Is only used for equality joins

Preconditions

  • The rows to be joined have to be on a common AMP
  • Both spools must be sorted by the row hash calculated over the join column(s)

Merge Joins Algorithms

Operation

Slow Path (Index is used on the left table)

  • The ROWHASH of each qualifying row in the left spool is used to lookup matching rows with identical ROWHASH in the right spool (with a binary search as both spools are sorted by row hash)
teradata merge join

Fast Path (No index is used)

  • No Index is used, the tables are full table scanned, alternating from the left to the right side the row hash match is done.
teradata fast path

The Non-PPI Merge Join Strategies

Possible Join Preparation

teradata merge join
  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of one or both spools by the ROWID

The common AMP of rows from two spools being joined is defined by the join columns. This leaves us with 4 data distribution scenarios:

  1. The Primary Indexes (or any other suitable index) of both tables equal the join columns: No join preparation is needed as the rows to be joined are already on the common AMP and sorted by the row hash.

    Hint: The Primary Index of both tables needs to include all join columns, but it can have additional columns as well. These are applied as residual conditions. It's only important that
  2. Only the Primary Index (or any other suitable index)  of one table matches the join columns: The rows of the second table have to be relocated to the common AMP
  3. Neither the Primary Index of the first table (or any other suitable index) nor the Primary Index (or any other suitable index)  of the second table matches the join columns: The rows of both tables have to be relocated to the common AMP

Relocation of rows to the common AMP can be done by redistribution of the rows by the join column(s) ROWHASH or by copying the smaller table to all AMPs.

If the smaller table is copied to all AMPs, it must be sorted in the next step (calculating the rowhash of the join columns and sorting).

The larger table must also be hashed and sorted according to the join columns if the primary index does not already correspond to the join columns. In this case, the larger table is spooled AMP-local.

Performance Considerations

If both tables have the same primary index, this join is very efficient, since no rows have to be copied between AMPs.

This join method is also very robust in other respects since the performance loss is bound if, for example, the number of rows in the tables was underestimated due to incorrect or missing statistics.

Another advantage: Since both tables are sorted by rowhash, each data block only has to be copied once into the main memory.

The PPI Merge Join Strategies

All methods mentioned above can be used by the optimizer even if PPI tables are involved in joining.

For example, if a non-partitioned table and a PPI table are joined, the optimizer can convert the PPI table into a non-partitioned table and then use one of the strategies listed above.

On the other hand, the optimizer can also partition the non-partitioned table in the same way as the PPI table when there is a join between a PPI table and an un-partitioned table.

However, this is only possible if all partition columns are present in the non-partitioned table.

If at least one table is partitioned, the following special merge join methods are also available to the optimizer:

The Rowkey-Based Merge Join

The Rowkey-Based Merge Join can be a direct join, that is, neither of the two tables involved is spooled.

A prerequisite for a rowkey-based merge join is that both tables have the same primary index and the same partitioning. Furthermore, the join must be an equijoin across the primary index columns and all partition columns.

If these prerequisites are not fulfilled (and an equijoin is done), the optimizer can still decide on a rowkey-based merge join, but join preparation is required (i.e. no direct join).

For example, in the case of a join between a PPI table and an unpartitioned table, the optimizer can restructure the unpartitioned table to the same primary index and the same partitions as the PPI table and then perform the rowkey-based merge join.

The Rowkey-Based Merge Join and CHARACTER SETS

When joining two PPI tables that use Character Partitioning, it is important to make sure that the CHARACTER SET of the partition columns are identical, otherwise, the Rowkey-Based Merge Join cannot be a direct join, but the table with CHARACTER SET LATIN must first be spooled to make the row keys of both tables identical.

Here is an example. Both tables are identical, except for the CHARACTER SET of the Column PartCol1.

CREATE MULTISET   TABLE MergeJoinChar2
(
	PK INTEGER NOT NULL,
	PartCol1 CHAR(10) CHARACTER SET LATIN

) PRIMARY INDEX (PK)
PARTITION BY  RANGE_N(PartCol1 BETWEEN 'A','B','C','D' AND 'Z')
;
CREATE MULTISET   TABLE MergeJoinChar2
(
	PK INTEGER NOT NULL,
	PartCol1 CHAR(10) CHARACTER SET UNICODE

) PRIMARY INDEX (PK)
PARTITION BY  RANGE_N(PartCol1 BETWEEN 'A','B','C','D' AND 'Z')
;
Explain SELECT *
FROM MergeJoinChar1 t01
INNER JOIN
MergeJoinChar2 t02
ON
 t01.PK = t02.PK
 AND t01.PartCol1 = t02.PartCol1;

  1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
     in all partitions to prevent global deadlock.
  3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
     TD_MAP1 for read.
  4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by way
     of an all-rows scan with no residual conditions into Spool 2
     (all_amps), which is built locally on the AMPs.  Then we do a SORT
     to partition Spool 2 by rowkey.  The size of Spool 2 is estimated
     with low confidence to be 73,420 rows (3,450,740 bytes).  The
     estimated time for this step is 0.29 seconds.
  5) We do an all-AMPs JOIN step in TD_MAP1 from DWHPRO.t02 by way of a
     RowHash match scan with no residual conditions, which is joined to
     Spool 2 (Last Use) by way of a RowHash match scan.  DWHPRO.t02 and
     Spool 2 are joined using a rowkey-based merge join, with a join
     condition of ("((TRANSLATE((PartCol1 )USING LATIN_TO_UNICODE))=
     DWHPRO.t02.PartCol1) AND (PK = DWHPRO.t02.PK)").  The result goes
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with low confidence to be 97,878
     rows (4,796,022 bytes).  The estimated time for this step is 0.25
     seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.53 seconds.

An Example for a Rowkey Based Merge Join

In this example, the join is done using Primary Index (PK) and Partition Columns (PartCol1) which are the same in both tables. A direct rowkey-based merge join (without spooling tables) can be performed:

Explain SELECT *
FROM MergeJoin1 t01
INNER JOIN
MergeJoin2 t02
ON
 t01.PK = t02.PK
 AND t01.PartCol1 = t02.PartCol1;

  1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
     RowHash in all partitions to prevent global deadlock.
  2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
     in all partitions to prevent global deadlock.
  3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
     TD_MAP1 for read.
  4) We do an all-AMPs JOIN step in TD_MAP1 from DWHPRO.t02 by way of a
     RowHash match scan with no residual conditions, which is joined to
     DWHPRO.t01 by way of a RowHash match scan with no residual
     conditions.  DWHPRO.t02 and DWHPRO.t01 are joined using a
     rowkey-based merge join, with a join condition of (
     "(DWHPRO.t01.PartCol1 = DWHPRO.t02.PartCol1) AND (DWHPRO.t01.PK =
     DWHPRO.t02.PK)").  The result goes into Spool 1 (group_amps),
     which is built locally on the AMPs.  The size of Spool 1 is
     estimated with low confidence to be 73,414 rows (2,716,318 bytes).
     The estimated time for this step is 0.15 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.15 seconds.

An Example for changing both PPI Tables to Non-PPI Tables

In this example, the join is performed on the primary index which is the same in both tables. The partition columns are not in the join condition. Both tables have the same size.

The optimizer decides to change both PPI tables to non-PPI tables (by spooling and sorting by rowhash of the primary index). A classic rowhash match scan then takes place.

Explain SELECT *
 FROM MergeJoin1 t01
 INNER JOIN
 MergeJoin2 t02
 ON
     t01.PK = t02.PK;
 1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
      RowHash in all partitions to prevent global deadlock.
   2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
      in all partitions to prevent global deadlock.
   3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
      TD_MAP1 for read.
   4) We execute the following steps in parallel.
        1) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t02 by
           way of an all-rows scan with no residual conditions into
           Spool 2 (all_amps), which is built locally on the AMPs.  Then
           we do a SORT to order Spool 2 by the hash code of (
           DWHPRO.t02.PK).  The size of Spool 2 is estimated with high
           confidence to be 73,414 rows (1,541,694 bytes).  The
           estimated time for this step is 0.27 seconds.
        2) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by
           way of an all-rows scan with no residual conditions into
           Spool 3 (all_amps), which is built locally on the AMPs.  Then
           we do a SORT to order Spool 3 by the hash code of (
           DWHPRO.t01.PK).  The size of Spool 3 is estimated with high
           confidence to be 73,414 rows (1,541,694 bytes).  The
           estimated time for this step is 0.27 seconds.
   5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of a RowHash match scan, which is joined to Spool 3 (Last Use)
      by way of a RowHash match scan.  Spool 2 and Spool 3 are joined
      using a merge join, with a join condition of ("PK = PK").  The
      result goes into Spool 1 (group_amps), which is built locally on
      the AMPs.  The size of Spool 1 is estimated with low confidence to
      be 73,414 rows (2,716,318 bytes).  The estimated time for this
      step is 0.17 seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.44 seconds.

The Sliding Window Merge Join

The Sliding Window Merge Join is used to join a PPI table with:

  • An unpartitioned table
  • A PPI table that is not identically partitioned

The Sliding Window Merge Join is described in detail in this post:

An Example for a Teradata Sliding Window Merge Join

In this example, the join is only performed using the primary index, which is the same in both tables. The only difference to the previous setup is, that the size of the second table is two times bigger.

This time the optimizer turns the smaller of the two tables into a non-PPI table and executes a sliding window merge join.

Explain SELECT *
 FROM MergeJoin1 t01
 INNER JOIN
 MergeJoin2 t02
 ON
     t01.PK = t02.PK;
 1) First, we lock DWHPRO.t02 in TD_MAP1 for read on a reserved
      RowHash in all partitions to prevent global deadlock.
   2) Next, we lock DWHPRO.t01 in TD_MAP1 for read on a reserved RowHash
      in all partitions to prevent global deadlock.
   3) We lock DWHPRO.t02 in TD_MAP1 for read, and we lock DWHPRO.t01 in
      TD_MAP1 for read.
   4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.t01 by way
      of an all-rows scan with no residual conditions into Spool 2
      (all_amps), which is built locally on the AMPs.  Then we do a SORT
      to order Spool 2 by the hash code of (DWHPRO.t01.PK).  The size of
      Spool 2 is estimated with high confidence to be 73,414 rows (
      1,541,694 bytes).  The estimated time for this step is 0.27
      seconds.
   5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of a RowHash match scan, which is joined to DWHPRO.t02 by way
      of a RowHash match scan with no residual conditions.  Spool 2 and
      DWHPRO.t02 are joined using a sliding-window merge join, with a
      join condition of ("PK = DWHPRO.t02.PK").  The result goes into
      Spool 1 (group_amps), which is built locally on the AMPs.  The
      size of Spool 1 is estimated with low confidence to be 146,828
      rows (5,432,636 bytes).  The estimated time for this step is 0.78
      seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 1.05 seconds.

2. Teradata Hash Join – The Sprinter, but only if executed in Memory

When will it be used?

  • Only can be used for equality joins

Preconditions

  • The rows to be joined have to be on a common AMP
  • The smaller spool is sorted by the ROWHASH calculated over the join column(s) and kept in the FSG cache
  • The larger spool stays unsorted

Operation

  • The larger spool is full table scanned row by row
  • Each ROWID from the bigger spools is searched in the smaller spool (with a binary search)

Possible Join Preparation

  • Re-Distribution of the smaller spool by ROWHASH or
  • Duplication of the smaller spool to all AMPs
  • Sorting of the smaller spool

Performance Considerations

Hash joins perform better than merge joins if certain requirements are met.

teradata hash join

3. Teradata Nested Join – The Fastest, but scarce

Preconditions

  • Spool 1 allows a unique ROWHASH access (a unique index is defined)
  • Spool 2 allows any kind of ROWHASH access (a unique or not unique is index defined)
  • Nested joins must have an equality join condition

Operation

  • The qualifying row of spool 1 is accessed by the usage of any unique index (UPI, USI). Rarely it can be a NUPI if the optimizer knows that only one row is returned.
  • The row is relocated to the AMPs owning these rows of spool 2.A Nested Join is using a few AMPs if the index of Spool 2 is a UPI, NUPI, or USI. It used all-AMPs if the index of Spool 2 is a NUSI.
  • Spool 2 is full table scanned and each row is combined with the one row from Spool 1

Possible Join Preparation

  • None

Performance Considerations

The nested join is the most efficient join because only one AMP is needed. It will be used mainly for tactical workloads,

4. Teradata Product Join – The disliked Guest

When will it be used?

  • The join condition is an inequality condition
  • Join conditions are OR combined
  • A referenced table is not used in the join condition
  • The Product Join is the cheapest join method

Preconditions

  • The rows to be joined have to be on the AMP
  • Neither of the two spools needs to be sorted

Operation

  • A full table scan is done on the smaller spool and
  • Each qualifying row of spool 1 is compared with each row of spool 2

Possible Join Preparation

  • Re-Distribution of one or both spools by ROWHASH or
  • Duplication of the smaller spool

Performance Considerations

The costs of a product join are usually very high since n*m comparisons must be carried out (n = number of rows in table1, m = number of rows in table2).

Another disadvantage of product joins, apart from a large number of comparisons, is that data blocks must be copied several times from memory to main memory if the small table cannot be kept completely in main memory.

Teradata Quiz
Teradata Quiz
Developer: Roland Wenzlofsky
Price: Free

Cracking The Teradata Join Estimation Code

What is Teradata Join Estimation?

This article shows how Teradata Join Estimation works when no statistics have been collected. We show which heuristics are used to estimate the number of rows, and why it is important to always collect statistics on all join columns.

Teradata Join Estimation Heuristics

The worst case is the following: 2 tables are joined which both have no collected statistics. To make sure that there are not even summary statistics on table level, we create both tables completely new and insert rows. The two tables are filled from a dummy table which serves as the basis in all our examples:

CREATE SET  TABLE DWHPRO.STATS1 ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      PI INTEGER NOT NULL,
      COL1 INTEGER NOT NULL,
      COL2 INTEGER NOT NULL)
PRIMARY INDEX ( PI );

We insert 50 rows into the table stats1

CREATE SET  TABLE DWHPRO.STATS2 ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      PI INTEGER NOT NULL,
      COL1 INTEGER NOT NULL,
      COL2 INTEGER NOT NULL)
PRIMARY INDEX ( PI );

We insert 150 rows into the table stats2

As the following SQL shows, there are no statistics on either table. The Optimizer must, therefore, estimate the cardinality of the two tables on the basis of a random AMP sample over the primary index:

SHOW STATISTICS VALUES ON STATS1;
[3624] There are no statistics defined for the table  
SHOW STATISTICS VALUES ON STATS2;
[3624] There are no statistics defined for the table  

We will perform a simple INNER JOIN of the two tables on a column that is not indexed:

EXPLAIN SELECT * FROM STATS2 inner JOIN STATS1 ON STATS1.COL1 = STATS2.COL1;

 1) First, we lock DWHPRO.STATS1 in TD_MAP1 for read on a reserved
      RowHash to prevent global deadlock.
   2) Next, we lock DWHPRO.STATS2 in TD_MAP1 for read on a reserved
      RowHash to prevent global deadlock.
   3) We lock DWHPRO.STATS1 in TD_MAP1 for read, and we lock
      DWHPRO.STATS2 in TD_MAP1 for read.
   4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.STATS1 by
      way of an all-rows scan with no residual conditions into Spool 2
      (all_amps), which is duplicated on all AMPs in TD_Map1.  The size
      of Spool 2 is estimated with low confidence to be 96 rows (2,400
      bytes).  The estimated time for this step is 0.00 seconds.
   5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of an all-rows scan, which is joined to DWHPRO.STATS2 by way
      of an all-rows scan with no residual conditions.  Spool 2 and
      DWHPRO.STATS2 are joined using a dynamic hash join, with a join
      condition of ("COL1 = DWHPRO.STATS2.COL1").  The result goes into
      Spool 1 (group_amps), which is built locally on the AMPs.  The
      size of Spool 1 is estimated with no confidence to be 600 rows (
      27,000 bytes).  The estimated time for this step is 0.01 seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.01 seconds.

From the execution plan, we can see that the estimated number of rows in the result set is 600. Furthermore, we can see that the estimated number of rows in table Stats1 is 96. However, we have to halve this number because the table is copied to all 2 AMPs. I.e. the optimizer estimates the number of rows in table Stats1 at 48, which is quite close to the real number of 50 rows.

To get the number of estimated rows in table Stats2 we can use the following trick:

EXPLAIN SELECT * FROM STATS2;
1) First, we lock DWHPRO.STATS2 in TD_MAP1 for read on a reserved
      RowHash to prevent global deadlock.
   2) Next, we lock DWHPRO.STATS2 in TD_MAP1 for read.
   3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.STATS2 by
      way of an all-rows scan with no residual conditions into Spool 1
      (group_amps), which is built locally on the AMPs.  The size of
      Spool 1 is estimated with low confidence to be 156 rows (5,148
      bytes).  The estimated time for this step is 0.01 seconds.
   4) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.01 seconds.

We can see from the execution plan that the number of estimated rows is 156.

Does the question arise how the estimate of the result set of 600 rows is made? The answer is that the following heuristic is applied:

SQRT(number of rows of the bigger table) * (Number of rows from the smaller table). The result is rounded up to the nearest whole number. (Many thanks to Diether Noeth who gave me this information in the Teradata forum):

Let's do the math for our example:

SQRT(156) = 12,489996 * 48 => 599,519808, which is rounded to 600.

If there are at least current statistics on the cardinality of the tables, these are used for the Heuristic:

COLLECT STATS COLUMN(PI) ON STATS1;
COLLECT STATS COLUMN(PI) ON STATS2;

EXPLAIN SELECT * FROM STATS2 inner JOIN STATS1 ON STATS1.COL1 = STATS2.COL1

EXPLAIN SELECT * FROM STATS2 inner JOIN STATS1 ON STATS1.COL1 = STATS2.COL1;
 1) First, we lock DWHPRO.STATS1 in TD_MAP1 for read on a reserved
      RowHash to prevent global deadlock.
   2) Next, we lock DWHPRO.STATS2 in TD_MAP1 for read on a reserved
      RowHash to prevent global deadlock.
   3) We lock DWHPRO.STATS1 in TD_MAP1 for read, and we lock
      DWHPRO.STATS2 in TD_MAP1 for read.
   4) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.STATS1 by
      way of an all-rows scan with no residual conditions into Spool 2
      (all_amps), which is duplicated on all AMPs in TD_Map1.  The size
      of Spool 2 is estimated with high confidence to be 100 rows (
      2,500 bytes).  The estimated time for this step is 0.00 seconds.
   5) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
      way of an all-rows scan, which is joined to DWHPRO.STATS2 by way
      of an all-rows scan with no residual conditions.  Spool 2 and
      DWHPRO.STATS2 are joined using a dynamic hash join, with a join
      condition of ("COL1 = DWHPRO.STATS2.COL1").  The result goes into
      Spool 1 (group_amps), which is built locally on the AMPs.  The
      size of Spool 1 is estimated with no confidence to be 613 rows (
      27,585 bytes).  The estimated time for this step is 0.01 seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1.  The total estimated time is 0.01 seconds.

The estimate for the result set has increased to 613 rows. But has it improved? Let's just run the query:

SELECT * FROM STATS2 inner JOIN STATS1 ON STATS1.COL1 = STATS2.COL1
-> SELECT completed. 0 rows returned.

As we can see, the query does not return any rows at all. So the estimation has even got a bit worse due to the statistics at the table level.

So it can be a bit confusing to look at both execution plans again. The retrieving step in the first plan (without statistics) has only low confidence, the retrieving step in the second plan (with statistics) has high confidence and still, the estimate with low confidence is better.

This was the first part about Join Estimations. In the next articles, we will take statistics step by step and see how this affects the optimizer's estimates.

1

Joins as a Main Target for Teradata Tuning

Joining tables is one of the most expensive operations of an SQL statement. This is because each join requires the rows to be joined on a common AMP.

If the two tables do not have the same primary index, it is necessary to copy the rows of one or both tables. This can cause a huge number of IOs in large tables.

For each join, the Teradata Optimizer can choose between different types of joins, each suitable for a specific initial scenario. The main goal of the Teradata Optimizer is to minimize resource consumption (IOs, CPU consumption, etc.) for the entire execution plan.

General Join Optimization Considerations

The optimizer makes use of available statistics in order to determine the optimal join plan.

The optimizer‘s cost estimations for each join are based on the following factors: The expected cardinality of the output spool (after the join) and the expected cardinality of the input spools after WHERE conditions have been applied.

The selected column lists of each input spool and the average row size. Usually, only the selected columns of each input spool are taken over into the resulting spool.

The only exception is if a permanent table is joined directly without spooling.

In such a case all columns of this table are taken over into the resulting spool.

Based on the above observations we can draw the following conclusions regarding performance:

The important statistics for each join are the ones on the join columns and on any available WHERE condition.
Whenever possible we should apply WHERE conditions in our queries in order to reduce the cardinality of the input tables.


further, we should only select the columns required in the result set and avoid „SELECT * FROM“ queries.

2

Teradata Merge Join vs. Product Join

When comparing the performance of different Teradata join types, we have to consider the total costs of the join strategy chosen for a particular join type.

A join plan can consist of several steps, and each of them is causing a certain amount of costs (I/Os and CPU seconds). The total join costs depend on:

  • The number of required comparisons
  • The number of times each data block is moved from the disk into memory
  • The number of distributed data (duplicated or rehashed)
  • The number of sorted rows

While the Merge Join causes significantly fewer comparisons than the Product Join, the join preparation in the form of data redistribution and sorting can be more expensive.

For the Product Join, the smaller table is copied to all AMPs and that's all that has to be done. Still, as each row of one table is compared with each row of the other table, costs of comparison are high.

Furthermore, costs will increase dramatically if the small table doesn't fit into each AMPs memory, as the same data blocks have to be read from disk more than once.

To summarize the cost characteristics of a Product Join:

  • High costs for comparison
  • Low costs for join preparation

The costs characteristics of a Merge Join are:

  • No costs (if the primary index of both tables is the same) to huge costs for join preparation
  • Low costs for comparison

To some extent, costs of these two join types behave in the opposite direction.

For the following query, the optimizer used a Merge Join:

SELECT Customer.ISO, Product.Desc
FROM Customer INNER JOIN Product ON Customer.ProductCd = Product.ProductCd;

4)We do an all-AMPs RETRIEVE step from Customer by way of an all-rows scan with no residual conditions into Spool 2, which is redistributed by hash code to all AMPs. Then, we do a SORT to order Spool 2 by row hash. The result spool file will not be cached in memory. The size of Spool 2 is estimated with low confidence to be 10,000,100 rows. The estimated time for this step is 30 minutes and 12 seconds.
5) We do an all-AMPs JOIN step from Customer by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use). Product and Spool 2 are joined using a merge join, with a join condition of (“Product.ProductCd=ProductCd”). The result goes into Spool 1, which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 10,000,000 rows. The estimated time for this step is 11 minutes and 20 seconds.

The Join preparation step (redistributing and sorting) takes more than 30 minutes. The join step needs about 11 minutes.

When I improved the query by adding an additional WHERE condition predicate on the Product table, the Optimizer switched to a Product Join:

SELECT Customer.ISO, Product.Desc
FROM Customer INNER JOIN Product ON Customer.ProductCd = Product.ProductCd
WHERE Product.Price <= 1000;

4) We do an all-AMPs RETRIEVE step from Product by way of an all-rows scan with a condition of (“Product.Price <= 1000”) into Spool 2, which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 100 rows. The estimated time for this step is 0.15 seconds.

5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to Customer. Spool 2 and Customer are joined using a product join, with a join condition of (“ProductCd = Customer.ProductCd”). The input table Customer will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 800,000 rows. The estimated time for this step is 18 minutes and 17 seconds.

The Join preparation step for the Product Join (duplicating) takes a ridiculous 0.15 seconds. The joining step needs about 18 minutes.

The Product Join performs significantly better than the Merge Join.

In our example, the WHERE condition predicate can only be applied to the Product table. A merge Join still would cause the same costs.

If we identify a Product Join in the execution plan, often this is caused by some mistake we did in the query ( the wrong usage of table alias, a table referenced, but missing in join condition). Still, for small tables, a Product Join might be the better choice, performing better than a Merge Join.

See also:
Dramatically improve Performance with the Teradata Nested Join

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.

4

Teradata Soft Referential Integrity

What is Referential Integrity?

Referential integrity defines relationships between tables based on a primary key and a foreign key.  It describes the columns in the referencing table which are a
foreign key in the referenced table.

Referential integrity is a robust mechanism to ensure data consistency in a database.

What is Soft Referential Integrity?

Soft Referential Integrity is a hint to the Optimizer that a relationship between tables exists. The Optimizer will rely on this information; it will not enforce referential integrity but is assumes that it is user imposed.

How is Soft Referential Integrity used?

Soft Referential Integrity is used for INNER JOIN Elimination in views.

If we select only columns from the referencing table (primary key side) of a view containing an inner join with a soft referential integrity defined between two tables, the inner join can be avoided. Here is an example:

REPLACE VIEW TradeInfo AS
SELECT
t01.TraderId,t01.TraderName,t01.TraderCountry,
t02.AccountCurrency,t02.BaseCurrency
FROM
Trader t01
INNER JOIN
TradeAccount t02
ON
t01.TraderId = t02.TraderId
;

We assume that there is a primary key (Trader) / foreign key (Trades) soft referential integrity defined on column TraderId.

If we execute below SQL statement, the Optimizer will not join to table “Trades”, as all referenced columns are from table “Trader”, and the defined soft referential integrity ensures that each row in “Trades” will be available in “Trader”.  Therefore, there is no need to “filter” non-matching rows with the INNER JOIN.

We could use “hard” referential integrity to achieve the join elimination, but enforcing “hard” referential integrity causes performance overhead we may want to avoid. Teradata soft referential integrity helps to improve performance.

There is one important fact about “soft” referential integrity, which has to be pointed out: The Optimizer trusts that we ensure the integrity of the relationship!

The syntax to define soft referential integrity is as follows. The datatype of the column has to be the same for both tables, and the referenced column has to be indexed uniquely:

CREATE MULTISET TABLE Trader
(
TraderId INTEGER
REFERENCES WITH NO CHECK OPTION TradeAccount(TraderId),
TraderName VARCHAR(255),

TraderCountry CHAR(03)
) UNIQUE PRIMARY INDEX (TraderId)
;

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.

7

Tactical Workload Tuning on Teradata

UPDATED
Tactical workload tuning is a particular discipline on Teradata. Tactical workload requires a unique skill set and tools.

In this article, I will show you the skills needed to optimize tactical workload.

You will learn how to use the proper tools and the “tactical workload skill set.”

The Difference between Tactical and Decision Support Workload

Decision support workload typically involves full table scans (FTS) and all-row joins -usually between big tables.
Indexing (NUSI, USI, etc.) can improve performance, but more often all table rows are accessed.

Here is an example of a decision support query:

SELECT CustomerGroupName, SUM(Revenue)
FROM
Sales t01
INNER JOIN
CustomerGroup t02
ON
t01.GroupCd = t02.GroupCd
GROUP BY 1;

Teradata Tactical Workload Tuning

The tactical workload is characterized by single row access or the access to a limited number of rows. Here is an example for a tactical query:

SELECT *
FROM Sales
WHERE SalesId = 100;

Optimized tactical queries need indexed access paths, such as UPI, NUPI, NUSI, USI, Join Indexes or a single row partition. The primary goal is to have an execution plan with each step only delivering one or a few rows.

Teradata Tactical Workload Tuning

The tactical selection criteria of your query have to be applied as early as possible in the execution plan.
While this allows avoiding deadlocks entirely, serialization might not be useable in any case for performance reasons.
The solution I used was to rewrite the most executed statements, increasing the number of changes to the database. By keeping the number of changes small, I was able to prevent deadlocks.
With a proper query design, tactical query execution is stable, and performance is not decreasing if table cardinality is growing, while decision support queries usually become slower with a growing amount of data:

TacticalDSS

Tactical query performance does not improve with the number of Teradata nodes. Adding more nodes to your Teradata system will not have any impact on the performance of your single tactical query. Nevertheless, adding nodes to your system allows more tactical queries to be executed at the same time.

The performance of decision support queries improves with the number of nodes. Assuming a perfect data distribution (no skewing), doubling the number of nodes can speed up your query by a factor of two (nevertheless, this are pure numbers, no system is working 100% in parallel and without any skewing effects).

The Tactical Workload Toolkit

Besides optimizing the retrieve steps with indexed access to single rows (USI, UPI, NUSI, Join Index, etc.), we have to optimize the joins. Join optimization means to minimize the number of AMPs involved in the join operations.

For example, the execution plan of below query shows that the selected row is accessed from only one AMP (single-AMP join step):

SELECT * FROM Sales s INNER JOIN SalesDetails sd ON s.SalesId = sd.SalesId WHERE s.SalesId = 100; 1)

First, we do a single-AMP JOIN step from Indexing.t01 by way of the primary index “Sales.s.SalesId = 100″
with no residual conditions, which is joined to Sales.SalesDetails by way of the primary index “Sales.sd.SalesId = 100″
with a residual condition of (“Sales.SalesDetails.SalesId = 100″). Sales.Sales and Sales.SalesDetails
are joined using a merge join, with a join condition of (“Sales.Sales.SalesId = Sales.SalesDetails.SalesId”).
The result goes into Spool 1 (one-amp), which is built locally on that AMP.
The size of Spool 1 is estimated with low confidence to be 151 rows.

Above query will always perform identical (given the execution plan is not changing), and this is what we want to achieve in tactical workload tuning: A stable direct access to the data.

While above example was built around a merge join, the most suitable join type for the tactical workload is the Nested Join.

When designing your physical data model, you should consider a design which allows the usage of Nested Joins.

Business users often need to retrieve information based on the natural key of a table and not the surrogate keys. It‘s always a good idea to think about keeping the natural keys in your physical data model when dealing with tactical workload!

Here is an example of a physical data model which allows the usage of Nested Joins.

CREATE TABLE Sales
(
ProductId INTEGER NOT NULL,
SalesId INTEGER NOT NULL
) PRIMARY INDEX (ProductId );

CREATE TABLE Products
(
ProductId INTEGER NOT NULL,
ProductCode CHAR(10)
) PRIMARY INDEX (ProductId )
UNIQUE INDEX (ProductCode);

In our example above, the data is accessed via the natural key “ProductCode” and not the surrogate key (ProductId). Therefore, we use a USI to create a direct data access path via the natural key:

SELECT * FROM Products p INNER JOIN Sales s ON p.prod_id = s.prod_id WHERE ProductCode = ‚Private‘

1)First, we do a two-AMP JOIN step from Sales.Products by way of unique index # 4
“Sales.p.ProductCode = ‘Private’" with no residual conditions, which is joined to Indexing.s by
The way of unique index # 4 “Sales.s.ProductId = Sales.p.ProductId" with no residual conditions.
Sales.p and Sales.s are joined using a nested join, with a join condition of ("(1=1)").
The result goes into Spool 1 (one-amp), which is built locally on the AMPs.

As you can see, the tactical query is resolved with a Nested Join and the USI on column ProductCode.

Forcing all Rows to one AMP

Another interesting tool we have to execute tactical queries against little tables is to force all rows to one AMP, by adding a column to the small table, populating it with just one value. This column has to be the primary index of the table.

Populating the primary index column with just one value will force all table rows to the same AMP, and if the table is sufficiently small, even into the same data block.

If everything works like expected, this results in a singe-AMP retrieve step instead of an All-AMP scan.

If there are many concurrent tactical queries executed against the table at the same time, the entire table will stay in memory, avoiding any access to the disk drive.

If you think about using this approach for several tables, use different values for the primary index of the “dummy” column, to avoid an overload of a single AMP.

A Case Study: Tactical Workload and Join Indexes

Recently, I was involved in an optimization project for a web-frontend tool with Teradata as the backend database.

During the development process, nobody had considered that the queries executed against the database are tactical workload. Most of the workload the tool generated was full table scans.

The web tool had been tested on a development system, but when it went into production, and dozens of people started to work in parallel, together with all the other workload, performance severely. The real problem was not the size of the involved tables, as all tables were quite small.

The real problem was a mix of end-user expectations when navigating in the web frontend, and the setup of the workload management, which did not classify the tool queries as tactical workload.

Users expected an average response time of just a few seconds, which could not be achieved with full table scans on a loaded system (I understood the users. Who wants to wait 30 seconds to save the information of just one client?)

Run times for the main activities were at best 1-2 seconds but increased to a couple of minutes on a busy system. Furthermore, workload delays added to the overall response times.

I began to add unique secondary indexes, but as the tables were tiny, the optimizer decided that a full table scan of the base table would be cheaper than the indexed access.

Finally, it turned out that the usage of join indexes was the solution. As the tool required different access paths to the data, I created several 1:1 join indexes per table, for each of the access paths, only altering the primary index of the join index.

When adding join indexes, we have to keep an eye on the impact of an update, insert and delete statement. In my case, it was no problem, as tables were small.

What I experienced was another issue: The huge number of join indexes caused a lot of deadlocks.

There are several ways to overcome this problem. We can, for example, serialize the user session activities, by using a “serialization” table. Each session has to get a write lock for each activity on this”serialization” table before it can apply any changes to the data.

In my project, I was able to minimize deadlocks by rewriting the critical pieces of code, reducing the number of changes (updates, deletes, inserts) done on the tables.

The last hint: Don't forget to collect statistics on the join indexes. Otherwise, they may not be used!

See also:
Teradata Golden Tuning Tipps 2017 – Take your Skills to the next Level!
6 Golden Rules for Teradata SQL Tuning
The Teradata Join Index Guide – We leave no Questions unanswered!
Dramatically improve Performance with the Teradata Nested Join

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.

1 2 3
>