Tag Archive

Tag Archives for " union "

Teradata UNION & UNION ALL in a Nutshell

Both statements, the Teradata UNION and the UNION ALL, combine the result sets of two or more SQL statements.

The UNION ALL statement takes all rows from the intermediate results and puts them together into the result set. The UNION statement removes duplicate rows before returning the result set:



All statements combined with UNION have to return the same number of columns, and the data types of all columns across the participating select statements have to match. If they don’t match, the data types of the very first SQL select statement will be the relevant ones and not match columns of the remaining statements are implicitly cast to the same data type the very first select statement has.

Keep this in mind, especially if your column is a character data type, as this could cause hidden truncation of text columns –  a problem which is tough to discover. But now let’s come back to performance considerations: UNION ALL always performs better than UNION. Without exception. 

The reason is that UNION removes duplicates from the result set – the AMPs have to sort the resulting rows of all spools to be able to remove duplicates.
Sorting is expensive and makes the difference in performance between UNION and UNION ALL.

Only use UNION if you need to remove row duplicates. If there are no row duplicates in the data, always avoid the sort step by using UNION ALL.


Teradata UNION ALL Enhancements

Teradata UNION ALL – The New Features of Release 16

Teradata 16 introduces several new features for queries containing UNION ALL. The usage of these new features can help to improve query performance, but will not always be applied.

The optimizer will decide to execute the query without making use of these features if cost estimations for the traditional execution plan are lower. Like always, it’s important to support the optimizer with correct statistics.

Before Teradata 16, the execution plan for a join between a derived table or view containing UNION ALL looks like this:

  • Create a common spool from all branches of the UNION ALL
  • Do the usual join preparation steps (rehashing, duplicating, etc.) for this spool
  • Join the spool (left table) with the right table

Until Teradata 15.10, the optimizer will not apply any reduction (WHERE conditions, etc.) before all branches of the statement containing UNION ALL are placed in a common spool.

Starting with Teradata 16, the optimizer has the possibility to execute queries containing UNION ALL in a different way:

  • Join spool of first UNION ALL branch (left table) against the right table
  • Join spool of second UNION ALL branch (left table) against the right table
  • Join spool of last UNION ALL branch (left table) against the right table
  • Merge the join results into a common spool

This approach will consume fewer resources than the original one if the partial join steps lead to smaller result spools for each partial join.

Likewise, aggregations can be applied on every single UNION ALL branch instead of spooling all branches into a common spool before doing the aggregation (which was the only available option for the optimizer until Teradata 15.10). Again, the performance improvement can be significant if the partial aggregations lead to small spool files.

Another enhancement was introduced for SELECT COUNT(*) queries which are containing UNION ALL.

Instead of moving all branches into a common spool and counting the number of rows one by one, the Optimizer can now use the cylinder index (the same technique used for single tables) to count the rows of each branch in a very efficient way and sums the result up afterward.

Unfortunately, these new enhancements cannot be used for UNION queries. Dou you know why? Leave your answer in the comment section.

If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.