 High-Performance Calculations with Teradata Ordered Analytic Functions

# High-Performance Calculations with Teradata Ordered Analytic Functions Teradata Ordered Analytic Functions use two different methods to redistribute the data, depending on the data demographics.

Teradata Ordered Analytic Functions are useful if the result of a row is dependent on the values of previous or subsequent rows.

Here is an example:

The running sum over the last 3 days should be calculated. The following calculation steps are necessary:

1. The rows have to be sorted ascending by date
2. The summation is carried out by:
• The value of the current row is added to the sum of the previous row
• The value of the row that lies 3 days back is subtracted

All the above steps can be performed with a single SQL:

```SELECT SUM(Amount) OVER (PARTITION BY Department_Id ORDER BY Sales_Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS RunningSum
FROM Sales;```

### Teradata Ordered Analytic Functions & Performance Considerations

• When using Teradata Analytic Functions, it should be noted that any available compressed Join Index or Hash Index is not used.
• All rows which fall into the same partition are moved to the same AMP. This can lead to the situation that the AMP is running out of spool space.
• Since the rows of a partition are copied to the same AMP, skewing can also occur.

More specifically, Teradata uses one of 2 different methods to distribute the rows (and to reduce skew):

• If the number of different values of the PARTITION BY columns is high in comparison to the number of AMPS, the distribution is carried out according to the PARTITION BY key using the usual hashing algorithm on the partition by columns.
• If the number of different values of the PARTITION BY columns is small compared to the number of AMPS, the distribution is carried out according to ranges of key values defined by the PARTITION by columns plus the ORDER BY columns.

### A Simple Trick to reduce Skewing

We can extend the PARTITION BY key or the ORDER BY with additional columns that have as many different values as possible. Of course, the additional column must not change the semantics of the query.

Here is an example, where we added a column to the ORDER BY (which of course reduces skewing only if ORDER BY columns are considered by Teradata when distributing the rows):

#### Original Query (Heavily Skewed)

```SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC ) AS TotalSum
FROM Ranges;

MinAMPIO = 102
MaxAMPIO = 237

```

#### Optimized Query (Less Skewed)

```SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC,  QuiteUniqueColumn ) AS TotalSum
FROM Ranges;

MinAMPIO = 191
MaxAMPIO = 231
```

### How to determine how the Rows are distributed?

If distribution by ranges is used, can easily be found out: Check the execution plan for the statement “redistributed by value to all AMPs

```Explain SELECT SUM(Amount) OVER (PARTITION BY Customer_Id ORDER BY Due_Date ASC, QuiteUniqueColumn ) AS TotalSum
FROM Ranges;

1) First, we lock DWHPRO.Ranges in TD_MAP1 for read on a reserved