fbpx

The most used join strategy is the MERGE join (or its equivalent for partitioned tables, the row key-based merge join). A join strategy that rarely puts in appearance is the Teradata NESTED join.

It needs a good deal of thought about query design to allow the Teradata Optimizer to pick this economical way of joining.

We will analyze at first how a NESTED join works and what is the enabling technology.

What enables the nested to join? The Optimizer needs to do a unique index lookup on one table and any indexed search on the other table.

A unique index search could be unique primary index access or unique secondary index access. The second table has to be accessible by any index (regardless of uniqueness).

In a quick and dirty environment, whereas a matter of principle unique primary indexes is avoided to mask out duplicate key problems, many possibilities to execute a nested join are missed.

In the following test scenario, we will show how significant the impact on a system could be:

I created two test tables, which allow the use of the NESTED join strategy.
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) UNIQUE PRIMARY INDEX (PK);
CREATE MULTISET TABLE TABLE_2 (PK INTEGER NOT NULL, ATTRIB_B  INTEGER) PRIMARY INDEX (ATTRIB_B );

I added some random records to both tables.

INSERT INTO TABLE_1 SELECT ROW_NUMBER() OVER ( ORDER BY 1), RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_2 SELECT ROW_NUMBER() OVER ( ORDER BY 1), ROW_NUMBER() OVER ( ORDER BY 1)  FROM SYS_CALENDAR.CALENDAR;

Finally, I designed a query which fitted the requirements for a NESTED join ideally:

SELECT * FROM TABLE_1 t01 INNER JOIN TABLE_2 t02 ON  t01.PK = T02.PK
WHERE t01.PK = 1 AND t02.ATTRIB_B = 10;

The result was, with only two disks accessed, the result set could be retrieved.

Afterward, I removed uniqueness from the first test table (UPI to NUPI). In the absence of any unique secondary index, this will prevent the Optimizer to apply the NESTED join strategy:

DROP TABLE TABE_1;
CREATE MULTISET TABLE TABLE_1 (PK INTEGER NOT NULL, ATTRIB_A  INTEGER) PRIMARY INDEX (PK);

The optimizer had to make another plan. It decided to do a MERGE join.

The second execution plan is evil. Two hundred twenty-eight disk accesses are needed:  2 disk accesses against 228! A huge difference.

Conclusion: 

Don’t omit the uniqueness of primary indexes just for questionable reasons.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Hi Roland,
    Thanks for another great post.

    Do you know the exact reason why 2 disks are accessed in the first case and 228 in second?
    How the process works in 2 disk scenarios and different from 28 one.

  • Hi Roland,

    How to find out the number of disk accesses or I/Os done in satisfying a particular query?

    Regards,
    Virendra

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >