Teradata Merge JoinWhen comparing the performance of different Teradata join types, we have to consider the total costs of the join strategy chosen for a particular join type.

A join plan can consist of several steps, and each of them is causing a certain amount of costs (I/Os and CPU seconds). The total join costs depend on:

  • The number of required comparisons
  • The number of times each data block is moved from the disk into memory
  • The number of distributed data (duplicated or rehashed)
  • The number of sorted rows

While the Merge Join causes significantly fewer comparisons than the Product Join, the join preparation in the form of data redistribution and sorting can be more expensive.

For the Product Join, the smaller table is copied to all AMPs and that’s all that has to be done. Still, as each row of one table is compared with each row of the other table, costs of comparison are high.

Furthermore, costs will increase dramatically if the small table doesn’t fit into each AMPs memory, as the same data blocks have to be read from disk more than once.

To summarize the cost characteristics of a Product Join:

  • High costs for comparison
  • Low costs for join preparation

The costs characteristics of a Merge Join are:

  • No costs (if the primary index of both tables is the same) to huge costs for join preparation
  • Low costs for comparison

To some extent, costs of these two join types behave in the opposite direction.

For the following query, the optimizer used a Merge Join:

SELECT Customer.ISO, Product.Desc
FROM Customer INNER JOIN Product ON Customer.ProductCd = Product.ProductCd;

4)We do an all-AMPs RETRIEVE step from Customer by way of an all-rows scan with no residual conditions into Spool 2, which is redistributed by hash code to all AMPs. Then, we do a SORT to order Spool 2 by row hash. The result spool file will not be cached in memory. The size of Spool 2 is estimated with low confidence to be 10,000,100 rows. The estimated time for this step is 30 minutes and 12 seconds.
5) We do an all-AMPs JOIN step from Customer by way of a RowHash match scan with no residual conditions, which is joined to Spool 2 (Last Use). Product and Spool 2 are joined using a merge join, with a join condition of (“Product.ProductCd=ProductCd”). The result goes into Spool 1, which is built locally on the AMPs. The result spool file will not be cached in memory. The size of Spool 1 is estimated with low confidence to be 10,000,000 rows. The estimated time for this step is 11 minutes and 20 seconds.

The Join preparation step (redistributing and sorting) takes more than 30 minutes. The join step needs about 11 minutes.

When I improved the query by adding an additional WHERE condition predicate on the Product table, the Optimizer switched to a Product Join:

SELECT Customer.ISO, Product.Desc
FROM Customer INNER JOIN Product ON Customer.ProductCd = Product.ProductCd
WHERE Product.Price <= 1000;

4) We do an all-AMPs RETRIEVE step from Product by way of an all-rows scan with a condition of (“Product.Price <= 1000”) into Spool 2, which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 100 rows. The estimated time for this step is 0.15 seconds.

5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to Customer. Spool 2 and Customer are joined using a product join, with a join condition of (“ProductCd = Customer.ProductCd”). The input table Customer will not be cached in memory, but it is eligible for synchronized scanning. The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 800,000 rows. The estimated time for this step is 18 minutes and 17 seconds.

The Join preparation step for the Product Join (duplicating) takes a ridiculous 0.15 seconds. The joining step needs about 18 minutes.

The Product Join performs significantly better than the Merge Join.

In our example, the WHERE condition predicate can only be applied to the Product table. A merge Join still would cause the same costs.

If we identify a Product Join in the execution plan, often this is caused by some mistake we did in the query ( the wrong usage of table alias, a table referenced, but missing in join condition). Still, for small tables, a Product Join might be the better choice, performing better than a Merge Join.

See also:
Dramatically improve Performance with the Teradata Nested Join

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.
Our Reader Score
[Total: 3    Average: 4.7/5]
Teradata Merge Join vs. Product Join written by Roland Wenzlofsky on March 18, 2017 average rating 4.7/5 - 3 user ratings

2 COMMENTS

  1. Hello.

    Thank you for the article.

    I have some small addition: a merge join may contain a product join inside.
    Let’s assume that we have the same NonUniquePrimaryIndex( Store_id and good_id) for 2 different tables.
    Each good may be sold many times in some store. Let’s assume that it is sold 1000 times in some store but in different days(which is not a part of PI). All 1000 records will have the same rowhash, stored on the same APM, and will have a different uniques value in its rowid.

    If we try to “make” a merge join of two tables then teradata do real merge join algorithm between different row hashes(for example if some store is presented in one table but does not presented in second then the rowhashes is not matched and that store will be skipped) but it has to make the product join between rows shared the same rowhash value(so for some particular pair of Store_id and Good_id all rows of first table will be product joined to the corresponding rows from second table).

    This is the reason why the performance of merge join may be decreased in some cases.
    Another popular reason is a different partitioning between joined tables but it not a theme of this article.

    Best regards,
    Aleksei Svitin.

LEAVE A REPLY

Please enter your comment!
Please enter your name here