Teradata 16 Introduces New Features for UNION ALL Queries

Roland Wenzlofsky

April 27, 2023

minutes reading time

Teradata UNION ALL – The New Features of Release 16

Teradata 16 introduces new features for UNION ALL queries that can enhance their performance, although their application is not guaranteed.

The Optimizer will opt for executing the query through conventional methods if the cost estimations for the traditional execution plan are lower. It is crucial to provide the Optimizer with accurate statistics.

Prior to Teradata 16, the execution plan for a join involving a derived table or view that includes a UNION ALL operation appeared as follows:

Create a common spool from all branches of the UNION ALL
Do the usual join preparation steps (rehashing, duplicating, etc.) for this spool.
Join the spool (left table) with the right table

Prior to Teradata 15.10, the Optimizer did not employ any reduction, such as WHERE conditions, until all branches of the statement including UNION ALL were consolidated in a shared spool.

Beginning with Teradata 16, the Optimizer can execute queries featuring UNION ALL differently:

Join spool of first UNION ALL branch (left table) against the right table
Join spool of second UNION ALL branch (left table) against the right table
Join spool of last UNION ALL branch (left table) against the right table
Merge the join results into a common spool

This method will require fewer resources than the original approach if the partial join stages result in smaller spools for each partial join.

Aggregations can be applied to each UNION ALL branch instead of consolidating all branches into a single spool before aggregation, which was the only option for the Optimizer before Teradata 15.10. The performance enhancement can be substantial if partial aggregations result in smaller spool files.

SELECT COUNT(*) queries containing UNION ALL now have an additional enhancement.

The Optimizer can efficiently calculate the number of rows for each branch by utilizing the cylinder index technique, which is the same method used for single tables, instead of consolidating all branches into a common spool and manually counting each row. The Optimizer then sums up the results.

Regrettably, the aforementioned improvements are incompatible with UNION queries. Kindly share your explanation in the comments.

Take a brief survey and receive the book for free Get instant access

Share0

Tweet0

Share0

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

Table1:
CLIENTNAME
Roland

Table2:
CLIENTNAME
Roland
Jiri
Patrick

UNION ALL: will return 1 row from Table1 + 3 rows from Table2 = 4 rows. This is deterministic, and every table separate will always return the same number of rows, so here you can push the calculations down to the branches.
UNION: as explained by Jiri, will remove duplicates. So only looking to table 2, you don’t know how many rows will be returned. You need to know the context from the other branches.

I think the answer is: UNION eliminates duplicate rows. There’s no way of telling whether there’re going to be any duplicates in the UNIONed set just by looking to Cylinder index.

🙂

Teradata 16 Introduces New Features for UNION ALL Queries

Teradata UNION ALL – The New Features of Release 16

Roland Wenzlofsky

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes