The Teradata NESTED join and the impact of the Primary Index

Roland Wenzlofsky

April 3, 2014

minutes reading time


The most used join strategy is the MERGE join (or its equivalent for row partitioned tables, the row key-based merge join). A join strategy that rarely puts in appearance is the Teradata NESTED join.

It needs a lot of thought about query design to allow the Teradata Optimizer to pick this economical way of joining.

We will analyze at first how a NESTED join works and what is the enabling technology.

What enables the nested to join? The Optimizer needs to do a unique index lookup on one table and any indexed search on the other table.

A unique index search could be unique primary index access or unique secondary index access. The second table must be accessible by any index (regardless of uniqueness).

In a quick and dirty environment, where a matter of principle, unique primary indexes are avoided to mask duplicate key problems, many possibilities to execute a nested join are missed.

In the following test scenario, we will show how significant the impact on a system could be:

I created two test tables, allowing the NESTED join strategy to be used.
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B  INTEGER) PRIMARY INDEX (ATTRIB_B );

I added some random records to both tables.

INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1)  FROM SYS_CALENDAR.CALENDAR;

Finally, I designed a query that fitted the requirements for a NESTED join ideally:

SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON  t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;

The result was that the result set could be retrieved with only two disks accessed.

Afterward, I removed uniqueness from the first test table (UPI to NUPI). In the absence of any unique secondary index, this will prevent the Optimizer from applying the NESTED join strategy:

DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) PRIMARY INDEX (PK);

The Optimizer had to make another plan. It decided to do a MERGE join.

The second execution plan is evil. Two hundred twenty-eight disk accesses are needed:  2 disk accesses against 228! A huge difference.

Conclusion: 

Don’t omit the uniqueness of primary indexes just for questionable reasons.

  • Hi Roland,
    Thanks for another great post.

    Do you know the exact reason why 2 disks are accessed in the first case and 228 in second?
    How the process works in 2 disk scenarios and different from 28 one.

  • Hi Roland,

    How to find out the number of disk accesses or I/Os done in satisfying a particular query?

    Regards,
    Virendra

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >