3

The Teradata NESTED join and the impact of the Primary Index

The most used join strategy is the MERGE join (or its equivalent  for partitioned tables, the row key based merge join). A join strategy which rarely puts in appearance is the Teradata NESTED join.

It needs a good deal of thought about query design to allow the Teradata Optimizer to pick this economical way of joining.

We will analyze at first, how a NESTED join works and what is the enabling technology.

What enables the nested join? The Optimizer needs to do a unique index lookup on one table and any indexed search on the other table.

Unique index search could be a unique primary index access or a unique secondary index access. The second table has to be accessible by any index (regardless of uniqueness).

In a quick and dirty environment, where as a matter of principle unique primary indexes are avoided to mask out duplicate key problems, many possibilities to execute a nested join are missed.

In the following test scenario we will show how significant the impact on a system could be:

I created two test tables, which allow the use of the NESTED join strategy
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B  INTEGER) PRIMARY INDEX (ATTRIB_B );

I added some random records to both tables

INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1)  FROM SYS_CALENDAR.CALENDAR;

Finally, I designed a query which fitted the requirements for a NESTED join ideally:

SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON  t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;

The result was, with only two disks accessed, the result set could be retrieved.

Afterward, I removed uniqueness from the first test table (UPI to NUPI). In the absence of any unique secondary index, this will prevent the Optimizer to apply the NESTED join strategy:

DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) PRIMARY INDEX (PK);

The optimizer had to make another plan. It decided to do a MERGE join.

The second execution plan is evil. 228 disk accesses are needed:  2 disk accesses against 228! A huge difference.

Conclusion: 

Don’t omit uniqueness of primary indexes just for questionable reasons.

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Avatar Virendra says:

    Hi Roland,

    How to find out the number of disk accesses or I/Os done in satisfying a particular query?

    Regards,
    Virendra

  • You can find this information in the DBC.DBQLOGTBL. You will have logical and phyiscal IOs available.

  • Avatar Niraj says:

    Hi Roland ,
    Thanks for another great post.

    Do you know the exact reason why 2 disk are accessed in first case and 228 in second ?
    How the process works in 2 disk scenario and different from 28 one.

  • >