Teradata employs two distinct approaches in Ordered Analytic Functions for preparing the data layout necessary for processing. This article explains both approaches and their respective advantages and disadvantages.
Teradata Ordered Analytical Functions
Teradata Analytic Functions are versatile tools that allow for a wide range of applications. The ability to retrieve previous and subsequent rows is critical in many queries.
To calculate the running total of a metric over the past three days, follow these steps:
- The rows have to be sorted ascending by date
- The summation is carried out:
- The value of the current row is added to the sum of the previous row
- The value of the row that lies three days back is subtracted
All of these steps can be executed using a single SQL statement.
SELECT SUM(Amount) OVER (PARTITION BY Department_Id ORDER BY Sales_Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS RunningSum
FROM Sales;
Teradata Ordered Analytical Functions & Performance Considerations
- Teradata Analytical Functions do not use any available compressed Join Index or Hash Index.
- All rows which fall into the same partition are moved to the same AMP. Changing data geography can cause “out of spool” situations. By partition, we mean the columns defined in the PARTITION BY part of the Ordered Analytic Function. This is not to be confused with Row Level Partitioning.
- Skewing can occur since the rows of a partition are copied to the same AMP.
Teradata evenly distributes rows across AMPs using one of two methods.
- Suppose the number of distinct values of the PARTITION BY columns is higher than the number of AMPs. In that case, the distribution is carried out according to the PARTITION BY key using the well-known hashing algorithm on the PARTITION BY columns.
- Suppose the number of distinct values of the PARTITION BY columns is lower than the number of AMPs. In that case, the distribution is carried out according to the ranges of key values defined by the PARTITION BY columns plus the ORDER BY column(s).
A Simple Trick to Reduce the Skewing
We can expand the PARTITION BY or ORDER BY clause by adding columns with more diverse values. However, the additional column should not alter the query’s meaning.
We added a column to the ORDER BY clause, which reduces skewing only if the clause is utilized for data distribution.
Original Query (Heavily Skewed)
SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC ) AS TotalSum
FROM Ranges;
MinAMPIO = 102
MaxAMPIO = 237
Optimized Query (Less Skewed)
SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC, QuiteUniqueColumn ) AS TotalSum
FROM Ranges;
MinAMPIO = 191
MaxAMPIO = 231
How to determine how the Rows are distributed?
If range distribution is utilized, the execution plan will display the statement “redistributed by value to all AMPs.”
Explain SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC, QuiteUniqueColumn ) AS TotalSum
FROM Ranges;
1) First, we lock DWHPRO.Ranges in TD_MAP1 for read on a reserved
RowHash to prevent global deadlock.
2) Next, we lock DWHPRO.Ranges in TD_MAP1 for read.
3) We do an all-AMPs STAT FUNCTION step in TD_MAP1 from DWHPRO.Ranges
by way of an all-rows scan with no residual conditions into Spool
5 (Last Use), which is assumed to be redistributed by value to all
AMPs in TD_Map1. The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs. The size is
estimated with low confidence to be 293,656 rows (12,039,896
bytes). The estimated time for this step is 0.57 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.57 seconds.
Related Services
⚡ Need Help Optimizing Your Data Platform?
We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.
Explore Our Services →📋 Considering a Move From Teradata?
Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.
Free Migration Assessment →