The Teradata NESTED join and the impact of the Primary Index

3
738
teradata nested join

teradata nested join

The most used join strategy is the MERGE join (or its equivalent  for partitioned tables, the row key based merge join). A join strategy which rarely puts in appearance is the Teradata NESTED join.

It needs a good deal of thought about query design to allow the Teradata Optimizer to pick this economical way of joining.

We will analyze at first, how a NESTED join works and what is the enabling technology.

What enables the nested join? The Optimizer needs to do a unique index lookup on one table and any indexed search on the other table.

Unique index search could be a unique primary index access or a unique secondary index access. The second table has to be accessible by any index (regardless of uniqueness).

In a quick and dirty environment, where as a matter of principle unique primary indexes are avoided to mask out duplicate key problems, many possibilities to execute a nested join are missed.

In the following test scenario we will show how significant the impact on a system could be:

I created two test tables, which allow the use of the NESTED join strategy
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B  INTEGER) PRIMARY INDEX (ATTRIB_B );

I added some random records to both tables

INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1)  FROM SYS_CALENDAR.CALENDAR;

Finally, I designed a query which fitted the requirements for a NESTED join ideally:

SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON  t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;

The result was, with only two disks accessed, the result set could be retrieved.

Afterward, I removed uniqueness from the first test table (UPI to NUPI). In the absence of any unique secondary index, this will prevent the Optimizer to apply the NESTED join strategy:

DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) PRIMARY INDEX (PK);

The optimizer had to make another plan. It decided to do a MERGE join.

The second execution plan is evil. 228 disk accesses are needed:  2 disk accesses against 228! A huge difference.

Conclusion: 

Don’t omit uniqueness of primary indexes just for questionable reasons.

Our Reader Score
[Total: 9    Average: 3.8/5]
The Teradata NESTED join and the impact of the Primary Index written by Roland Wenzlofsky on April 3, 2014 average rating 3.8/5 - 9 user ratings

3 COMMENTS

  1. Hi Roland ,
    Thanks for another great post.

    Do you know the exact reason why 2 disk are accessed in first case and 228 in second ?
    How the process works in 2 disk scenario and different from 28 one.

LEAVE A REPLY

Please enter your comment!
Please enter your name here