October 22

# High-Performance Calculations with Teradata Ordered Analytic Functions

By Roland Wenzlofsky

October 22, 2019

Teradata Ordered Analytic Functions use two different methods to redistribute the data, depending on the data demographics.

Teradata Ordered Analytic Functions are useful if the result of a row is dependent on the values of previous or subsequent rows.

Here is an example:

The running sum over the last 3 days should be calculated. The following calculation steps are necessary:

1. The rows have to be sorted ascending by date
2. The summation is carried out by:
• The value of the current row is added to the sum of the previous row
• The value of the row that lies 3 days back is subtracted

All the above steps can be performed with a single SQL:

```SELECT SUM(Amount) OVER (PARTITION BY Department_Id ORDER BY Sales_Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS RunningSum
FROM Sales;```

### Teradata Ordered Analytic Functions & Performance Considerations

• When using Teradata Analytic Functions, it should be noted that any available compressed Join Index or Hash Index is not used.
• All rows which fall into the same partition are moved to the same AMP. This can lead to the situation that the AMP is running out of spool space.
• Since the rows of a partition are copied to the same AMP, skewing can also occur.

More specifically, Teradata uses one of 2 different methods to distribute the rows (and to reduce skew):

• If the number of different values of the PARTITION BY columns is high in comparison to the number of AMPS, the distribution is carried out according to the PARTITION BY key using the usual hashing algorithm on the partition by columns.
• If the number of different values of the PARTITION BY columns is small compared to the number of AMPS, the distribution is carried out according to ranges of key values defined by the PARTITION by columns plus the ORDER BY columns.

### A Simple Trick to reduce Skewing

We can extend the PARTITION BY key or the ORDER BY with additional columns that have as many different values as possible. Of course, the additional column must not change the semantics of the query.

Here is an example, where we added a column to the ORDER BY (which of course reduces skewing only if ORDER BY columns are considered by Teradata when distributing the rows):

#### Original Query (Heavily Skewed)

```SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC ) AS TotalSum
FROM Ranges;

MinAMPIO = 102
MaxAMPIO = 237

```

#### Optimized Query (Less Skewed)

```SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC,  QuiteUniqueColumn ) AS TotalSum
FROM Ranges;

MinAMPIO = 191
MaxAMPIO = 231
```

### How to determine how the Rows are distributed?

If distribution by ranges is used, can easily be found out: Check the execution plan for the statement “redistributed by value to all AMPs

```Explain SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC, QuiteUniqueColumn ) AS TotalSum
FROM Ranges;

1) First, we lock DWHPRO.Ranges in TD_MAP1 for read on a reserved
2) Next, we lock DWHPRO.Ranges in TD_MAP1 for read.
3) We do an all-AMPs STAT FUNCTION step in TD_MAP1 from DWHPRO.Ranges
by way of an all-rows scan with no residual conditions into Spool
5 (Last Use), which is assumed to be redistributed by value to all
AMPs in TD_Map1.  The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs.  The size is
estimated with low confidence to be 293,656 rows (12,039,896
bytes).  The estimated time for this step is 0.57 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.  The total estimated time is 0.57 seconds.
```

### Roland Wenzlofsky

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

You might also like