Teradata – Why SELECT * impacts Performance?

Roland Wenzlofsky

January 14, 2016

minutes reading time


This article will teach you why selecting only the needed columns makes sense for performance reasons. Our example query performs a join over the two tables listed below:

CREATE SET TABLE TheDatabase.TheTable1
(
REF_PK BIGINT NOT NULL,
COL1 CHAR(20) NOT NULL,
COL2 INTEGER,

COL99 CHAR(100)
) PRIMARY INDEX ( COl1 );

CREATE MULTISET TABLE TheDatabase.TheTable2
(
PK BIGINT NOT NULL,
COL1 CHAR(20) NOT NULL,
COl2 INTEGER,

COL20 INTEGER
) PRIMARY INDEX ( COL1 )
INDEX ( REF_PK )
;

The primary indexes of both tables do not match the join columns, so the rows must be redistributed or copied to a common AMP before the join can occur. We created a NUSI on the join column (REF_PK) of the table “TheTable2” in our test scenario.

Our first query selects all columns from both tables:

Explain
SELECT *
FROm
TheDatabase.TheTable1 t01
INNER JOIN
TheTable2 t02
ON
t01.RELATED_PK = t02.PK

1) First, we lock a distinct TheDatabase.” pseudo table” for read on a
RowHash to prevent global deadlock for TheDatabase.t01.
2) Next, we lock Q_DBS_SGL_WORK.TheTable2 in view TheDatabase.TheTable2
for access, and we lock TheDatabase.t01 for read.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from Q_DBS_SGL_WORK.TheTable2
in view TheDatabase.TheTable2 by way of an all-rows scan with
no residual conditions into Spool 2 (all_amps) (compressed
columns allowed) fanned out into 10 hash join partitions,
which is redistributed by the hash code of (
TheDatabase.TheTable2.PK) to all AMPs. The size of
Spool 2 is estimated with high confidence to be 16,206,188
rows (1,798,886,868 bytes).  The estimated time for this step
is 7.86 seconds.

2) We do an all-AMPs RETRIEVE step from TheDatabase.t01 by way
          of an all-rows scan with no residual conditions into Spool 3
(all_amps) (compressed columns allowed) fanned out into 10
hash join partitions, which is redistributed by the hash code
of (TheDatabase.t01.RELATED_PK) to all AMPs. The input table
will not be cached in memory, but it is eligible for
synchronized scanning. The result spool file will not be
cached in memory.  The size of Spool 3 is estimated with high
          confidence to be 25,914,926 rows (17,259,340,716 bytes).  The
estimated time for this step is 1 minute and 13 seconds.
4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using a hash join
of 10 partitions, with a join condition of (“RELATED_PK = PK”).
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with low confidence to be
      36,760,336 rows (30,731,640,896 bytes).  The estimated time for
this step is 24.07 seconds.

5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 1 minute and 37 seconds.

As the above execution plan shows are, all rows of both tables redistributed by the hash code of the join column;  a full table scan is required as no row qualification via WHERE condition is done. The retrieve step estimations are considered to be of “high confidence”.

Improving The Teradata Column List

The second query selects only one column:

Explain SELECT
t01.BOI_CODE
FROm TheDatabase.TheTable1 t01
INNER JOIN
TheTable2 t02
ON
t01.RELATED_PK = t02.PK

1) First, we lock a distinct TheDatabase.” pseudo table” for read on a
RowHash to prevent global deadlock for TheDatabase.t01.
2) Next, we lock TheDatabase.TheTable2 in view TheDatabase.TheTable2
for access, and we lock TheDatabase.t01 for read.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from TheDatabase.TheTable2
in view TheDatabase.TheTable2 by way of a traversal of index #
          8 without accessing the base table with no residual
conditions into Spool 2 (all_amps) fanned out into 2 hash
          join partitions, which is redistributed by the hash code of (
TheDatabase.TheTable2.PK) to all AMPs. The size of
Spool 2 is estimated with high confidence to be 16,206,188
          rows (275,505,196 bytes).  The estimated time for this step
is 1.62 seconds.

2) We do an all-AMPs RETRIEVE step from TheDatabase.t01 by way
of an all-rows scan with no residual conditions into Spool 3
(all_amps) fanned out into 2 hash join partitions, which is
redistributed by the hash code of (
TheDatabase.t01.RELATED_PK) to all AMPs. The input table
will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 3 is estimated with
high confidence to be 25,914,926 rows (647,873,150 bytes).
The estimated time for this step is 8.72 seconds.
4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using a hash join of 2 partitions, with a join condition of (“RELATED_PK = PK”).
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with low confidence
to be 36,760,336 rows (1,066,049,744 bytes). The estimated time
for this step is 1.85 seconds.

5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 10.57 seconds.[/su_panel]

There are several improvements in the execution plan of the second query :

(1)  Although we retrieve in both cases the same 16,206,188 from the table “TheTable2”, the spool is much smaller as only the required columns of this table are put into the spool (275,505,196 bytes against 1,798,886,868 bytes). The same is valid for the spool of table “TheTable1” (647,873,150 bytes against 30,731,640,896 bytes).

(2) The join step requires only two hash join partitions instead of 10.

(3)  As no column of table “TheTable2” is selected, the optimizer traverses the NUSI instead of the base table. This would not have been the case if columns of table “TheTable2” had been part of the result set. The NUSI usage is reflected in the small spool for “TheTable2”

Below you can see the resulting reduction in resource usage:

 NumResultRowsTotalIOCountAMPCPUTimeSpool Usage
One Column47.314.937,00118.289,00120,632.367.786.496,00
All Columns47.314.937,00472.106,00522,0317.860.372.480,00

In the second query, we need 75% fewer IOs, 77% fewer CPU seconds, and 87% less spool space!

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>