What is the Teradata TD_WhichMax Function?

Teradata releases new SQL features with each update, providing enhanced functionality and improved performance with reduced resource usage.

We will examine the Teradata TD_WhichMax and TD_WhichMin functions and explore their potential benefits.

These functions retrieve the minimum or maximum value of a column and are commonly utilized in SQL using the following syntax:

SELECT * FROM DWHPRO.TESTTABLE
WHERE ID IN 
(
   SELECT MAX(ID)
   FROM  DWHPRO.TESTTABLE
);

Here is the Execution Plan:

3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
     DWHPRO.TESTTABLE by way of an all-rows scan with no
     residual conditions, grouping by field1 (25570).  Aggregate
     intermediate results are computed globally, then placed in Spool 2
     in TD_Map1.  The input table will not be cached in memory, but it
     is eligible for synchronized scanning.  The size of Spool 2 is
     estimated with high confidence to be 1 row (19 bytes).  The
     estimated time for this step is 44.87 seconds.
  4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 2 (Last Use)
     by way of an all-rows scan into Spool 6 (all_amps) (compressed
     columns allowed), which is duplicated on all AMPs in TD_Map1.  The
     size of Spool 6 is estimated with high confidence to be 216 rows (
     3,672 bytes).  The estimated time for this step is 0.01 seconds.
  5) We do an all-AMPs JOIN step in TD_Map1 from Spool 6 (Last Use) by
     way of an all-rows scan, which is joined to
     DWHPRO.TESTTABLE by way of an all-rows scan with no
     residual conditions.  Spool 6 and DWHPRO.TESTTABLE are
     joined using a product join, with a join condition of (
     "DWHPRO.TESTTABLE.ID = Field_2").  The input
     table DWHPRO.TESTTABLE will not be cached in memory,
     but it is eligible for synchronized scanning.  The result goes
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with no confidence to be 67,590
     rows (46,096,380 bytes).  The estimated time for this step is
     33.51 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 1 minute and 18 seconds.

The “TESTTABLE” contains 5 billion rows, and the “ID” column is not the primary key.

Teradata now offers the TD_WhichMax (or TD_WhichMin) functions as an alternative that provides the same functionality.

SELECT * FROM TD_WhichMax (
 ON DWHPRO.TESTTABLE AS InputTable
  USING  TargetColumn ('ID')
AS EVENT;

Here is the Execution Plan:

  3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
     DWHPRO.TESTTABLE by way of an all-rows scan with no
     residual conditions, grouping by field1 (-5412).  Aggregate
     intermediate results are computed globally, then placed in Spool 4
     in TD_Map1.  The input table will not be cached in memory, but it
     is eligible for synchronized scanning.  The size of Spool 4 is
     estimated with high confidence to be 1 row (19 bytes).  The
     estimated time for this step is 44.87 seconds.
  4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 4 (Last Use)
     by way of an all-rows scan into Spool 2 (all_amps), which is built
     locally on the AMPs.  The size of Spool 2 is estimated with high
     confidence to be 1 row (17 bytes).  The estimated time for this
     step is 0.00 seconds.
  5) We do an all-AMPs DISPATCHER RETRIEVE step in TD_Map1 from Spool 2
     (Last Useby way of an all-rows scan and send the rows back to
     the Dispatcher.  The size is estimated with high confidence to be
     1 row.  The estimated time for this step is 0.00 seconds.
  6) We do an all-AMPs RETRIEVE step in TD_MAP1 from
     DWHPRO.TESTTABLE by way of an all-rows scan with a
     condition of ("DWHPRO.TESTTABLE.ID = :%SSQ25")
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The input table will not be cached in memory, but it is eligible
     for synchronized scanning.  The result spool file will not be
     cached in memory.  The size of Spool 1 is estimated with no
     confidence to be 456,832,786 rows (311,559,960,052 bytes).  The
     estimated time for this step is 2 minutes and 53 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 3 minutes and 38 seconds.

The performance metrics are as follows:

QueryBandIOCPUSpool
Traditional7730368,006128,568519680
TD_WhichMax7727713,005583,914419584

The number of IOs is nearly identical, and the CPU usage is slightly improved with the TD_WhichMax function. Additionally, the TD_WhichMax function utilizes only half the spool space compared to traditional SQL. This is due to the product join, which doubles the required spool space for conventional queries.

Is it safe to assume that Teradata’s TD_WhichMax function consistently outperforms the conventional subquery method? Not necessarily, as it also hinges on the Optimizer’s ability to incorporate the primary index into the execution plan. Consequently, we will compare both queries again, but this time with the “ID” column serving as the primary index for the table.

QueryBandIOCPUSpool
Traditional3863543,003191,438192
TD_WhichMax3863478,003075,528192

All performance metrics are similar after retrieving the query through the Primary Index.

Learn more about Teradata’s TD_WhatMax and TD_WhatMin functions at this resource.

TD_WhatMax and TD_WhatMin

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>