The MERGE join is the most commonly utilized join strategy, or its equivalent for row partitioned tables, known as the row key-based merge join. One seldom-seen join strategy is the Teradata NESTED join.
The Teradata Optimizer must carefully consider query design to select the most cost-effective joining method.
First, we will examine the mechanics of a nested join and the technology that facilitates it.
What allows for nesting to occur? The Optimizer must perform a distinct index lookup on one table and an indexed search on the other table.
A distinct index search can refer to either primary index access or secondary index access. The second table must be accessible via any index, regardless of uniqueness.
In a fast-paced setting, where avoiding unique primary indexes is crucial to conceal duplicate key issues, numerous opportunities to perform a nested join go unnoticed.
In this test scenario, we will demonstrate the significant impact that can be inflicted upon a system.
I created two test tables, allowing the NESTED join strategy to be used.
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B INTEGER) PRIMARY INDEX (ATTRIB_B );
I inserted arbitrary data into both tables.
INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1) FROM SYS_CALENDAR.CALENDAR;
I ultimately created a query meeting the ideal NESTED join prerequisites.
SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;
The outcome was retrieving the result set with only two disk accesses.
Subsequently, I removed the uniqueness constraint from the initial test table, changing it from UPI to NUPI. Without any unique secondary index, this will hinder the Optimizer from utilizing the NESTED join strategy.
DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) PRIMARY INDEX (PK);
The Optimizer formulated an alternative scheme, opting to execute a MERGE join.
The second execution plan is highly inefficient, requiring a staggering 228 disk accesses instead of a mere 2.
Do not disregard the distinctiveness of primary indexes for dubious motives.
Thanks for another great post.
Do you know the exact reason why 2 disks are accessed in the first case and 228 in second?
How the process works in 2 disk scenarios and different from 28 one.
You can find this information in the DBC.DBQLOGTBL. You will have logical and physical IOs available.
How to find out the number of disk accesses or I/Os done in satisfying a particular query?