Teradata UNION ALL – The New Features of Release 16
Teradata 16 introduces new features for UNION ALL queries that can enhance their performance, although their application is not guaranteed.
The Optimizer will opt for executing the query through conventional methods if the cost estimations for the traditional execution plan are lower. It is crucial to provide the Optimizer with accurate statistics.
Prior to Teradata 16, the execution plan for a join involving a derived table or view that includes a UNION ALL operation appeared as follows:
- Create a common spool from all branches of the UNION ALL
- Do the usual join preparation steps (rehashing, duplicating, etc.) for this spool.
- Join the spool (left table) with the right table
Prior to Teradata 15.10, the Optimizer did not employ any reduction, such as WHERE conditions, until all branches of the statement including UNION ALL were consolidated in a shared spool.
Beginning with Teradata 16, the Optimizer can execute queries featuring UNION ALL differently:
- Join spool of first UNION ALL branch (left table) against the right table
- Join spool of second UNION ALL branch (left table) against the right table
- Join spool of last UNION ALL branch (left table) against the right table
- Merge the join results into a common spool
This method will require fewer resources than the original approach if the partial join stages result in smaller spools for each partial join.
Aggregations can be applied to each UNION ALL branch instead of consolidating all branches into a single spool before aggregation, which was the only option for the Optimizer before Teradata 15.10. The performance enhancement can be substantial if partial aggregations result in smaller spool files.
SELECT COUNT(*) queries containing UNION ALL now have an additional enhancement.
The Optimizer can efficiently calculate the number of rows for each branch by utilizing the cylinder index technique, which is the same method used for single tables, instead of consolidating all branches into a common spool and manually counting each row. The Optimizer then sums up the results.
Regrettably, the aforementioned improvements are incompatible with UNION queries. Kindly share your explanation in the comments.
UNION ALL: will return 1 row from Table1 + 3 rows from Table2 = 4 rows. This is deterministic, and every table separate will always return the same number of rows, so here you can push the calculations down to the branches.
UNION: as explained by Jiri, will remove duplicates. So only looking to table 2, you don’t know how many rows will be returned. You need to know the context from the other branches.
I think the answer is: UNION eliminates duplicate rows. There’s no way of telling whether there’re going to be any duplicates in the UNIONed set just by looking to Cylinder index.