Optimizing Join and Aggregation with Teradata UNION ALL Views

Roland Wenzlofsky

April 23, 2023

minutes reading time

Teradata UNION ALL Introduction

Teradata introduced a new Optimizer feature for managing sets of rows combined by UNION ALL in views and derived tables.

In earlier versions of Teradata, rows combined with Teradata UNION ALL were consolidated into a shared spool before further operations like joining another table or performing an aggregation step. Consider the following example of a join operation:

REPLACE VIEW TestView AS
(
SELECT PK, COL1 FROM TheTableA
UNION ALL
SELECT PK, COL1 FROM TheTableB
) ;

SELECT *
FROM
TestView T01
INNER JOIN
TestTable T02
ON
T01.PK = T02.PK
WHERE
T01.COL2 = 100;

The Teradata 15.10 Optimizer creates a common spool and performs a join operation with the “TestTable”.

The common UNION ALL spool will not reduce rows during construction. All join preparation steps, including redistribution on join columns, must be performed using the spool created from the combination of all UNION ALL branches.

The Optimizer in Teradata 16 enables the division of a join into several segments and an additional choice.

Each Teradata UNION ALL branch can be joined separately to the “TestTable”. The final outcome of these join processes is stored in a final spool.

To format the query for presentation, we use the following structure:

SELECT * FROMTheTableA T01
INNER JOIN
TestTable T02
ON T01.PK = T02.PK;
WHERE T01.COL2 = 100;

SELECT * FROMTheTableB T01
INNER JOIN
TestTable T02
ON T01.PK = T02.PK;
WHERE T01.COL2 = 100;

This option can save resources by effectively decreasing the selected row count in every join.

The potential for decreased resource consumption rises as tables become larger and are merged using UNION ALL before being joined with an additional table.

The Teradata 16 Optimizer can apply aggregation to each branch of UNION ALL separately and then execute global aggregation in a subsequent step.

Before Teradata 15.10, aggregation occurred once all UNION ALL branches had been consolidated into a shared spool.

The primary benefit is counting the rows of a view or derived table that includes multiple branches merged with UNION ALL. Before Teradata 15.10, a common spool was created, which required a row count.

Starting with Teradata 14.10, the Optimizer has implemented a uniform optimization method for individual tables. This entails computing the row count on the Cylinder Index for each table linked by a UNION ALL operation and subsequently summing up the results. This technique remains in use in Teradata 16.

The Optimizer can utilize the aforementioned techniques to enhance performance by reducing spool sizes by pushing aggregation or joining into each UNION ALL branch.

The Optimizer heavily relies on statistics to determine the feasibility of implementing novel methods.