The Teradata NESTED join and the impact of the Primary Index
The most used join strategy is the MERGE join (or its equivalent for partitioned tables, the row key based merge join). A join strategy which rarely puts in appearance is the Teradata NESTED join.
It needs a good deal of thought about query design to allow the Teradata Optimizer to pick this economical way of joining.
We will analyze at first, how a NESTED join works and what is the enabling technology.
What enables the nested join? The Optimizer needs to do a unique index lookup on one table and any indexed search on the other table.
Unique index search could be a unique primary index access or a unique secondary index access. The second table has to be accessible by any index (regardless of uniqueness).
In a quick and dirty environment, where as a matter of principle unique primary indexes are avoided to mask out duplicate key problems, many possibilities to execute a nested join are missed.
In the following test scenario we will show how significant the impact on a system could be:
I created two test tables, which allow the use of the NESTED join strategy
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B INTEGER) PRIMARY INDEX (ATTRIB_B );
I added some random records to both tables
INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1) FROM SYS_CALENDAR.CALENDAR;
Finally, I designed a query which fitted the requirements for a NESTED join ideally:
SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;
The result was, with only two disks accessed, the result set could be retrieved.
Afterward, I removed uniqueness from the first test table (UPI to NUPI). In the absence of any unique secondary index, this will prevent the Optimizer to apply the NESTED join strategy:
DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) PRIMARY INDEX (PK);
The optimizer had to make another plan. It decided to do a MERGE join.
The second execution plan is evil. 228 disk accesses are needed: 2 disk accesses against 228! A huge difference.
Don’t omit uniqueness of primary indexes just for questionable reasons.