The Teradata NESTED Join And The Impact Of The Primary Index

The MERGE join is the most commonly utilized join strategy, or its equivalent for row partitioned tables, known as the row key-based merge join. One seldom-seen join strategy is the Teradata NESTED join.

The Teradata Optimizer must carefully consider query design to select the most cost-effective joining method.

First, we will examine the mechanics of a nested join and the technology that facilitates it.

What allows for nesting to occur? The Optimizer must perform a distinct index lookup on one table and an indexed search on the other table.

A distinct index search can refer to either primary index access or secondary index access. The second table must be accessible via any index, regardless of uniqueness.

In a fast-paced setting, where avoiding unique primary indexes is common practice to conceal duplicate key issues, numerous opportunities to perform a nested join go unnoticed.

In this test scenario, we will demonstrate the significant impact this can have on a system.

I created two test tables, allowing the NESTED join strategy to be used.
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B INTEGER) PRIMARY INDEX (ATTRIB_B );

I inserted arbitrary data into both tables.

INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1) FROM SYS_CALENDAR.CALENDAR;

I then created a query meeting the ideal NESTED join prerequisites.

SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;

The result was retrieving the result set with only two disk accesses.

Subsequently, I removed the uniqueness constraint from the initial test table, changing it from UPI to NUPI. Without any unique secondary index, this will prevent the Optimizer from utilizing the NESTED join strategy.

DROP TABLE TABLE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A INTEGER) PRIMARY INDEX (PK);

The Optimizer formulated an alternative plan, opting to execute a MERGE join.

The second execution plan is highly inefficient, requiring a staggering 228 disk accesses instead of a mere 2.

Conclusion:

Do not disregard the uniqueness of primary indexes for dubious reasons.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

3 thoughts on “The Teradata NESTED join and the impact of the Primary Index”

Niraj

05/12/2016 at 5:27 am

Hi Roland,
Thanks for another great post.

Do you know the exact reason why 2 disks are accessed in the first case and 228 in second?
How the process works in 2 disk scenarios and different from 28 one.
Roland Wenzlofsky

12/03/2015 at 7:40 pm

You can find this information in the DBC.DBQLOGTBL. You will have logical and physical IOs available.
Virendra

12/03/2015 at 4:23 pm

Hi Roland,

How to find out the number of disk accesses or I/Os done in satisfying a particular query?

Regards,
Virendra

The Teradata NESTED join and the impact of the Primary Index

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

3 thoughts on “The Teradata NESTED join and the impact of the Primary Index”

Leave a Comment Cancel reply