Why Selecting Only Needed Columns Boosts Performance: Teradata Query Example

Roland Wenzlofsky

April 22, 2023

minutes reading time

Learn the benefits of choosing necessary columns for optimal performance. Our query in question involves a join between two specified tables:

CREATE SET TABLE TheDatabase.TheTable1
(
REF_PK BIGINT NOT NULL,
COL1 CHAR(20) NOT NULL,
COL2 INTEGER,
…
COL99 CHAR(100)
) PRIMARY INDEX ( COl1 );

CREATE MULTISET TABLE TheDatabase.TheTable2
(
PK BIGINT NOT NULL,
COL1 CHAR(20) NOT NULL,
COl2 INTEGER,
…
COL20 INTEGER
) PRIMARY INDEX ( COL1 )
INDEX ( REF_PK )
;

The rows of two tables with different primary indexes must be redistributed or copied to a common AMP to perform the join. As a solution, we created a NUSI on the join column (REF_PK) of “TheTable2” during our test scenario.

Our initial query retrieves all columns from both tables:

Explain
SELECT *
FROm
TheDatabase.TheTable1 t01
INNER JOIN
TheTable2 t02
ON
t01.RELATED_PK = t02.PK

1) First, we lock a distinct TheDatabase.” pseudo table” for read on a
RowHash to prevent global deadlock for TheDatabase.t01.
2) Next, we lock Q_DBS_SGL_WORK.TheTable2 in view TheDatabase.TheTable2
for access, and we lock TheDatabase.t01 for read.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Q_DBS_SGL_WORK.TheTable2
in view TheDatabase.TheTable2 by way of an all-rows scan with
no residual conditions into Spool 2 (all_amps) (compressed
columns allowed) fanned out into 10 hash join partitions,
which is redistributed by the hash code of (
TheDatabase.TheTable2.PK) to all AMPs. The size of
Spool 2 is estimated with high confidence to be 16,206,188
rows (1,798,886,868 bytes). The estimated time for this step
is 7.86 seconds.

2) We do an all-AMPs RETRIEVE step from TheDatabase.t01 by way
of an all-rows scan with no residual conditions into Spool 3
(all_amps) (compressed columns allowed) fanned out into 10
hash join partitions, which is redistributed by the hash code
of (TheDatabase.t01.RELATED_PK) to all AMPs. The input table
will not be cached in memory but is eligible for
synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 3 is estimated with high
confidence to be 25,914,926 rows (17,259,340,716 bytes). The
estimated time for this step is 1 minute and 13 seconds.
4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using a hash join
of 10 partitions, with a join condition of (“RELATED_PK = PK”).
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with low confidence to be
36,760,336 rows (30,731,640,896 bytes). The estimated time for
this step is 24.07 seconds.

5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 1 minute and 37 seconds.

The execution plan above indicates that both tables have their rows redistributed based on the hash code of the join column. No WHERE condition is applied. Therefore, a full table scan is necessary. The retrieve step estimations are deemed highly confident.

Improving The Teradata Column List

The second query selects a single column.

Explain SELECT
t01.BOI_CODE
FROm TheDatabase.TheTable1 t01
INNER JOIN
TheTable2 t02
ON
t01.RELATED_PK = t02.PK

1) First, we lock a distinct TheDatabase.” pseudo table” for read on a
RowHash to prevent global deadlock for TheDatabase.t01.
2) Next, we lock TheDatabase.TheTable2 in view TheDatabase.TheTable2
for access, and we lock TheDatabase.t01 for read.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from TheDatabase.TheTable2
in view TheDatabase.TheTable2 by way of a traversal of index #
          8 without accessing the base table with no residual
conditions into Spool 2 (all_amps) fanned out into 2 hash
          join partitions, which is redistributed by the hash code of (
TheDatabase.TheTable2.PK) to all AMPs. The size of
Spool 2 is estimated with high confidence to be 16,206,188
          rows (275,505,196 bytes). The estimated time for this step
is 1.62 seconds.

2) We do an all-AMPs RETRIEVE step from TheDatabase.t01 by way
of an all-rows scan with no residual conditions into Spool 3
(all_amps) fanned out into 2 hash join partitions, which is
redistributed by the hash code of (
TheDatabase.t01.RELATED_PK) to all AMPs. The input table
will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 3 is estimated with
high confidence to be 25,914,926 rows (647,873,150 bytes).
The estimated time for this step is 8.72 seconds.
4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using a hash join of 2 partitions, with a join condition of (“RELATED_PK = PK”).
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with low confidence
to be 36,760,336 rows (1,066,049,744 bytes). The estimated time
for this step is 1.85 seconds.

The second query’s execution plan has undergone multiple enhancements.

(1) While both cases retrieve 16,206,188 from “TheTable2”, the spool is much smaller as only the necessary columns are included (275,505,196 bytes instead of 1,798,886,868 bytes). This also applies to the spool of “TheTable1” (647,873,150 bytes instead of 30,731,640,896 bytes).

The join step only needs two hash join partitions rather than 10.

When “TheTable2” column is not selected, the optimizer uses the NUSI instead of the base table. However, this would not be the case if columns from “TheTable2” were included in the result set. The utilization of the NUSI is apparent from the small spool size for “TheTable2”.

Observe the decrease in resource consumption below:

	NumResultRows	TotalIOCount	AMPCPUTime	Spool Usage
One Column	47.314.937,00	118.289,00	120,63	2.367.786.496,00
All Columns	47.314.937,00	472.106,00	522,03	17.860.372.480,00