The most used join strategy is the MERGE join (or its equivalent for partitioned tables, the row key-based merge join). A join strategy that rarely puts in appearance is the Teradata NESTED join.
It needs a good deal of thought about query design to allow the Teradata Optimizer to pick this economical way of joining.
We will analyze at first how a NESTED join works and what is the enabling technology.
What enables the nested to join? The Optimizer needs to do a unique index lookup on one table and any indexed search on the other table.
A unique index search could be unique primary index access or unique secondary index access. The second table has to be accessible by any index (regardless of uniqueness).
In a quick and dirty environment, whereas a matter of principle unique primary indexes is avoided to mask out duplicate key problems, many possibilities to execute a nested join are missed.
In the following test scenario, we will show how significant the impact on a system could be:
I created two test tables, which allow the use of the NESTED join strategy.
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B INTEGER) PRIMARY INDEX (ATTRIB_B );
I added some random records to both tables.
INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1) FROM SYS_CALENDAR.CALENDAR;
Finally, I designed a query which fitted the requirements for a NESTED join ideally:
SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;
The result was, with only two disks accessed, the result set could be retrieved.
Afterward, I removed uniqueness from the first test table (UPI to NUPI). In the absence of any unique secondary index, this will prevent the Optimizer to apply the NESTED join strategy:
DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) PRIMARY INDEX (PK);
The optimizer had to make another plan. It decided to do a MERGE join.
The second execution plan is evil. Two hundred twenty-eight disk accesses are needed: 2 disk accesses against 228! A huge difference.
Don’t omit the uniqueness of primary indexes just for questionable reasons.