Teradata UNION vs. UNION ALL – Performance Considerations

1
3339

teradata union all

Both statements, the Teradata UNION and the UNION ALL, combine the result sets of two or more SQL statements.

The UNION ALL statement takes all rows from the intermediate results and puts them together into the result set. The UNION statement removes duplicate rows before returning the result set:

SELECT <COLUMN1> FROM <TABLE_1>
UNION
SELECT <COLUMN1> FROM <TABLE_2> ;

SELECT <COLUMN1> FROM <TABLE_1>
UNION ALL
SELECT <COLUMN1> FROM <TABLE_2> ;

All statements combined with UNION have to return the same number of columns, and the data types of all columns across the participating select statements have to match. If they don’t match, the data types of the very first SQL select statement will be the relevant ones and not match columns of the remaining statements are implicitly cast to the same data type the very first select statement has.

Keep this in mind, especially if your column is a character data type, as this could cause hidden truncation of text columns –  a problem which is tough to discover. But now let’s come back to performance considerations: UNION ALL always performs better than UNION. Without exception. 

The reason is that UNION removes duplicates from the result set – the AMPs have to sort the resulting rows of all spools to be able to remove duplicates.
Sorting is expensive and makes the difference in performance between UNION and UNION ALL.

Only use UNION if you need to remove row duplicates. If there are no row duplicates in the data, always avoid the sort step by using UNION ALL.

Our Reader Score
[Total: 13    Average: 4.2/5]
Teradata UNION vs. UNION ALL – Performance Considerations written by Roland Wenzlofsky on May 15, 2014 average rating 4.2/5 - 13 user ratings

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here