What is the Teradata TD_WhichMax Function?
Teradata releases new SQL features with each update, providing enhanced functionality and improved performance with reduced resource usage.
We will examine the Teradata TD_WhichMax and TD_WhichMin functions and explore their potential benefits.
These functions retrieve the minimum or maximum value of a column and are commonly utilized in SQL using the following syntax:
SELECT * FROM DWHPRO.TESTTABLE
WHERE ID IN
(
SELECT MAX(ID)
FROM DWHPRO.TESTTABLE
);
Here is the Execution Plan:
3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
DWHPRO.TESTTABLE by way of an all-rows scan with no
residual conditions, grouping by field1 (25570). Aggregate
intermediate results are computed globally, then placed in Spool 2
in TD_Map1. The input table will not be cached in memory, but it
is eligible for synchronized scanning. The size of Spool 2 is
estimated with high confidence to be 1 row (19 bytes). The
estimated time for this step is 44.87 seconds.
4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 2 (Last Use)
by way of an all-rows scan into Spool 6 (all_amps) (compressed
columns allowed), which is duplicated on all AMPs in TD_Map1. The
size of Spool 6 is estimated with high confidence to be 216 rows (
3,672 bytes). The estimated time for this step is 0.01 seconds.
5) We do an all-AMPs JOIN step in TD_Map1 from Spool 6 (Last Use) by
way of an all-rows scan, which is joined to
DWHPRO.TESTTABLE by way of an all-rows scan with no
residual conditions. Spool 6 and DWHPRO.TESTTABLE are
joined using a product join, with a join condition of (
"DWHPRO.TESTTABLE.ID = Field_2"). The input
table DWHPRO.TESTTABLE will not be cached in memory,
but it is eligible for synchronized scanning. The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be 67,590
rows (46,096,380 bytes). The estimated time for this step is
33.51 seconds.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 1 minute and 18 seconds.
The “TESTTABLE” contains 5 billion rows, and the “ID” column is not the primary key.
Teradata now offers the TD_WhichMax (or TD_WhichMin) functions as an alternative that provides the same functionality.
SELECT * FROM TD_WhichMax (
ON DWHPRO.TESTTABLE AS InputTable
USING TargetColumn ('ID')
) AS EVENT;
Here is the Execution Plan:
3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
DWHPRO.TESTTABLE by way of an all-rows scan with no
residual conditions, grouping by field1 (-5412). Aggregate
intermediate results are computed globally, then placed in Spool 4
in TD_Map1. The input table will not be cached in memory, but it
is eligible for synchronized scanning. The size of Spool 4 is
estimated with high confidence to be 1 row (19 bytes). The
estimated time for this step is 44.87 seconds.
4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 4 (Last Use)
by way of an all-rows scan into Spool 2 (all_amps), which is built
locally on the AMPs. The size of Spool 2 is estimated with high
confidence to be 1 row (17 bytes). The estimated time for this
step is 0.00 seconds.
5) We do an all-AMPs DISPATCHER RETRIEVE step in TD_Map1 from Spool 2
(Last Use) by way of an all-rows scan and send the rows back to
the Dispatcher. The size is estimated with high confidence to be
1 row. The estimated time for this step is 0.00 seconds.
6) We do an all-AMPs RETRIEVE step in TD_MAP1 from
DWHPRO.TESTTABLE by way of an all-rows scan with a
condition of ("DWHPRO.TESTTABLE.ID = :%SSQ25")
into Spool 1 (group_amps), which is built locally on the AMPs.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 1 is estimated with no
confidence to be 456,832,786 rows (311,559,960,052 bytes). The
estimated time for this step is 2 minutes and 53 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 3 minutes and 38 seconds.
The performance metrics are as follows:
QueryBand | IO | CPU | Spool |
Traditional | 7730368,00 | 6128,56 | 8519680 |
TD_WhichMax | 7727713,00 | 5583,91 | 4419584 |
The number of IOs is nearly identical, and the CPU usage is slightly improved with the TD_WhichMax function. Additionally, the TD_WhichMax function utilizes only half the spool space compared to traditional SQL. This is due to the product join, which doubles the required spool space for conventional queries.
Is it safe to assume that Teradata’s TD_WhichMax function consistently outperforms the conventional subquery method? Not necessarily, as it also hinges on the Optimizer’s ability to incorporate the primary index into the execution plan. Consequently, we will compare both queries again, but this time with the “ID” column serving as the primary index for the table.
QueryBand | IO | CPU | Spool |
Traditional | 3863543,00 | 3191,43 | 8192 |
TD_WhichMax | 3863478,00 | 3075,52 | 8192 |
All performance metrics are similar after retrieving the query through the Primary Index.
Learn more about Teradata’s TD_WhatMax and TD_WhatMin functions at this resource.