Teradata TD_WhichMax and TD_WhichMin Functions: A SQL Solution for Finding Min/Max Value of a Column with Low Resource Consumption

Roland Wenzlofsky

May 5, 2023

minutes reading time

What is the Teradata TD_WhichMax Function?

Teradata releases new SQL features with each update, providing enhanced functionality and improved performance with reduced resource usage.

We will examine the Teradata TD_WhichMax and TD_WhichMin functions and explore their potential benefits.

These functions retrieve the minimum or maximum value of a column and are commonly utilized in SQL using the following syntax:

SELECT * FROM DWHPRO.TESTTABLE
WHERE ID IN 
(
   SELECT MAX(ID)
   FROM  DWHPRO.TESTTABLE
);

Here is the Execution Plan:

3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
     DWHPRO.TESTTABLE by way of an all-rows scan with no
     residual conditions, grouping by field1 (25570).  Aggregate
     intermediate results are computed globally, then placed in Spool 2
     in TD_Map1.  The input table will not be cached in memory, but it
     is eligible for synchronized scanning.  The size of Spool 2 is
     estimated with high confidence to be 1 row (19 bytes).  The
     estimated time for this step is 44.87 seconds.
  4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 2 (Last Use)
     by way of an all-rows scan into Spool 6 (all_amps) (compressed
     columns allowed), which is duplicated on all AMPs in TD_Map1.  The
     size of Spool 6 is estimated with high confidence to be 216 rows (
     3,672 bytes).  The estimated time for this step is 0.01 seconds.
  5) We do an all-AMPs JOIN step in TD_Map1 from Spool 6 (Last Use) by
     way of an all-rows scan, which is joined to
     DWHPRO.TESTTABLE by way of an all-rows scan with no
     residual conditions.  Spool 6 and DWHPRO.TESTTABLE are
     joined using a product join, with a join condition of (
     "DWHPRO.TESTTABLE.ID = Field_2").  The input
     table DWHPRO.TESTTABLE will not be cached in memory,
     but it is eligible for synchronized scanning.  The result goes
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with no confidence to be 67,590
     rows (46,096,380 bytes).  The estimated time for this step is
     33.51 seconds.
  6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 1 minute and 18 seconds.

The “TESTTABLE” contains 5 billion rows, and the “ID” column is not the primary key.

Teradata now offers the TD_WhichMax (or TD_WhichMin) functions as an alternative that provides the same functionality.

SELECT * FROM TD_WhichMax (
 ON DWHPRO.TESTTABLE AS InputTable
  USING  TargetColumn ('ID')
) AS EVENT;

Here is the Execution Plan:

  3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
     DWHPRO.TESTTABLE by way of an all-rows scan with no
     residual conditions, grouping by field1 (-5412).  Aggregate
     intermediate results are computed globally, then placed in Spool 4
     in TD_Map1.  The input table will not be cached in memory, but it
     is eligible for synchronized scanning.  The size of Spool 4 is
     estimated with high confidence to be 1 row (19 bytes).  The
     estimated time for this step is 44.87 seconds.
  4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 4 (Last Use)
     by way of an all-rows scan into Spool 2 (all_amps), which is built
     locally on the AMPs.  The size of Spool 2 is estimated with high
     confidence to be 1 row (17 bytes).  The estimated time for this
     step is 0.00 seconds.
  5) We do an all-AMPs DISPATCHER RETRIEVE step in TD_Map1 from Spool 2
     (Last Use) by way of an all-rows scan and send the rows back to
     the Dispatcher.  The size is estimated with high confidence to be
     1 row.  The estimated time for this step is 0.00 seconds.
  6) We do an all-AMPs RETRIEVE step in TD_MAP1 from
     DWHPRO.TESTTABLE by way of an all-rows scan with a
     condition of ("DWHPRO.TESTTABLE.ID = :%SSQ25")
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The input table will not be cached in memory, but it is eligible
     for synchronized scanning.  The result spool file will not be
     cached in memory.  The size of Spool 1 is estimated with no
     confidence to be 456,832,786 rows (311,559,960,052 bytes).  The
     estimated time for this step is 2 minutes and 53 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 3 minutes and 38 seconds.

The performance metrics are as follows:

QueryBand	IO	CPU	Spool
Traditional	7730368,00	6128,56	8519680
TD_WhichMax	7727713,00	5583,91	4419584

The number of IOs is nearly identical, and the CPU usage is slightly improved with the TD_WhichMax function. Additionally, the TD_WhichMax function utilizes only half the spool space compared to traditional SQL. This is due to the product join, which doubles the required spool space for conventional queries.

Is it safe to assume that Teradata’s TD_WhichMax function consistently outperforms the conventional subquery method? Not necessarily, as it also hinges on the Optimizer’s ability to incorporate the primary index into the execution plan. Consequently, we will compare both queries again, but this time with the “ID” column serving as the primary index for the table.

QueryBand	IO	CPU	Spool
Traditional	3863543,00	3191,43	8192
TD_WhichMax	3863478,00	3075,52	8192

All performance metrics are similar after retrieving the query through the Primary Index.

Learn more about Teradata’s TD_WhatMax and TD_WhatMin functions at this resource.

TD_WhatMax and TD_WhatMin

Take a brief survey and receive the book for free Get instant access

Share0

Tweet0

Share0

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

Teradata TD_WhichMax and TD_WhichMin Functions: A SQL Solution for Finding Min/Max Value of a Column with Low Resource Consumption

What is the Teradata TD_WhichMax Function?

Roland Wenzlofsky

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes