The most used join strategy is the MERGE join (or its equivalent for partitioned tables, the row key-based merge join). A join strategy that rarely puts in appearance is the Teradata NESTED join.
It needs a good deal of thought about query design to allow the Teradata Optimizer to pick this economical way of joining.
We will analyze at first how a NESTED join works and what is the enabling technology.
What enables the nested to join? The Optimizer needs to do a unique index lookup on one table and any indexed search on the other table.
A unique index search could be unique primary index access or unique secondary index access. The second table has to be accessible by any index (regardless of uniqueness).
In a quick and dirty environment, whereas a matter of principle unique primary indexes is avoided to mask out duplicate key problems, many possibilities to execute a nested join are missed.
In the following test scenario, we will show how significant the impact on a system could be:
I created two test tables, which allow the use of the NESTED join strategy.
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_AÂ INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_BÂ INTEGER) PRIMARY INDEX (ATTRIB_BÂ );
I added some random records to both tables.
INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1)Â FROM SYS_CALENDAR.CALENDAR;
Finally, I designed a query which fitted the requirements for a NESTED join ideally:
SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ONÂ t01.PK = T02.PK
WHERE t01.PK = 1Â AND t02.ATTRIB_B = 10;
The result was, with only two disks accessed, the result set could be retrieved.
Afterward, I removed uniqueness from the first test table (UPI to NUPI). In the absence of any unique secondary index, this will prevent the Optimizer to apply the NESTED join strategy:
DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_AÂ INTEGER) PRIMARY INDEX (PK);
The optimizer had to make another plan. It decided to do a MERGE join.
The second execution plan is evil. Two hundred twenty-eight disk accesses are needed: Â 2 disk accesses against 228! A huge difference.
Conclusion:Â
Don’t omit the uniqueness of primary indexes just for questionable reasons.
Hi Roland,
Thanks for another great post.
Do you know the exact reason why 2 disks are accessed in the first case and 228 in second?
How the process works in 2 disk scenarios and different from 28 one.
You can find this information in the DBC.DBQLOGTBL. You will have logical and physical IOs available.
Hi Roland,
How to find out the number of disk accesses or I/Os done in satisfying a particular query?
Regards,
Virendra