UNION and UNION ALL
The Teradata UNION statement combines the result set of at least two queries; in addition, UNION removes row duplicates. Each of the select statements must pick the same number of columns, the columns must be selected in the same order, and the datatypes of the column of all select statements must be compatible (more about this later). Below is an example:
FROM TheTable1SELECT col1 FROM TheTable1 UNION ALL SELECT col1 FROM TheTable2; UNION SELECT col1 FROM TheTable2 ;
If the datatypes for the same column of different select statements don’t match exactly, the one of the very first select statement will define the datatype for all rows of the column; if needed, Teradata does an implicit casting:
CREATE TABLE Customer
) PRIMARY INDEX (CustomerId);
CREATE TABLE Customer_Hist
) PRIMARY INDEX (CustomerId);
SELECT LastName FROM Customer UNION ALL SELECT LastName FROM Customer_Hist
The last names of longer than 100 characters are truncated from the Customer_Hist table in the example above. An error that is difficult to detect because the query runs without errors; incompatible data types lead to runtime errors.
Teradata UNION ALL vs. UNION Performance
The performance of UNION ALL is always better than that of UNION.
Since Teradata UNION only returns distinct rows, all select statements’ rows must be sorted into a common spool. For this purpose, it might be necessary to redistribute the rows because the rows of all result sets must have the same primary index for the sort step. Both the redistribution of the rows and the subsequent sorting require many IOs for large tables. Therefore we only should use UNION if we need to remove row duplicates. If there are no row duplicates in the data, we should always avoid the additional sort (and eventually redistribution) step using UNION ALL.
Significance of the Character Set
We have already discussed how different data types can affect the UNION or UNION ALL result set. The same is true for the character set. If in the example shown below, the column “col1” of table “Table1” is defined as a character set LATIN, but column “col1” of table “Table2” is defined as a character set UNICODE, a runtime error will occur if table “Table2” contains UNICODE characters in column “col1”, because the character set of the first select statement defines the character set of the result set:
FROM Table1 UNION SELECT col1 FROM Table2 ;
INTERSECT / INTERSECT ALL
The INTERSECT operator returns as the result row duplicate of two select statements ( all rows identical in both select statements).
FROM TheTable1 INSERTSECT SELECT col1,col2 FROM TheTable2 ;
Similar to Teradata UNION ALL, INTERSECT ALL returns duplicates without filtering them.
Similar to UNION ALL, INTERSECT ALL returns duplicates without filtering them. Again, it is recommended to use INTERSECT ALL instead of INTERSECT for better performance (the reasons are the same as for UNION ALL above).
MINUS / EXCEPT
The MINUS/EXCEPT operator returns a result set containing all rows of the first select statement that are not present in the second select statement; MINUS and EXCEPT are two different names for the same functionality:
FROM TheTable1SELECT col1,col2 FROM TheTable1 EXCEPT SELECT col1,col2 FROM TheTable2; MINUS SELECT col1,col2 FROM TheTable2 ;
MINUS is often used to determine whether two tables are completely identical. For this, the MINUS operator must be executed two times, as shown below:
FROM TheTable1SELECT col1,col2 FROM TheTable2 EXCEPT SELECT col1,col2 FROM TheTable1; MINUS SELECT col1,col2 FROM TheTable2 ;
We must also check whether both tables to be compared are populated. Otherwise, MINUS might provide equality of the tables, although it is not given.
Precedence of SET Operators
The set operators (UNION, UNION ALL, EXCEPT, MINUS) all have the same priority. When using SET operators, it is recommended that we always use parentheses to ensure the desired priority.