Teradata OLAP Functions

0
1000

 

Have you met Teradata OLAP Functions in your data warehouse applications? Have you ever been asked to use them? Maybe you were struggling with understanding how they work or how they relate to the business problems they shall solve.

Alternatively, have you wondered why it takes official training material your company is paying dearly for dozens of printed pages to approach the topic while you do not see the principle behind after having gone through it all?

If you have made any of these experiences, I would like to offer you a systematic approach to understanding and working with these aggregates. Any window aggregate function we came across is a manifestation of only five design decisions you have to take.

Here they are:

  1. Over what population of data (the “where”-decision) ?
  2. For which aggregation levels (the “partition”-decision) ?
  3. Over what lag and what lead (the “preceding-following” decision) ?
  4. What kind of aggregation with what intent (the “function” decision) ?
  5. Over which attribute(s) (the “grouping” decision) ?

The first decision concerns the data population you wish to work on. Are all rows of a table relevant? What data layers does your report focus on? Note that this is the ordinary where-decision you have to make for any kind of SQL you write. In window aggregations, the where-decision is a two-storey building with the second floor being a conditional qualify-option, given your data set decision. Having is the equivalent conditional option in simple aggregation.

Second, you have to decide on the levels of aggregation. Use the partition by option inside the over() part to make this decision.

Third, decide how many or over how many rows back and forth you would like to go to meet your desired result. With the right preceding-following settings inside the over() part determine whether you create a moving average, cumulate results or a training comparison of values.

Fourth, the aggregate you take co-determines what you get as a result. Note that the function you choose is a technical solution to the semantics of the business problem that does not necessarily sound like the implementation you decide on.

Fifth, you have to decide on which attributes to display with the result set. Although your options are limited here with respect to creating result sets that make sense, you can still decide to show or hide columns, both transformed or untransformed.

None of the above decisions is totally independent of the other, but there are degrees of freedom for each of them.

To verify my approach and test your decision variations, you can create and fill the following table with dummy data and run variations of the below window aggregate, just altering one decision at a time.

CREATE SET TABLE TABLE_A_S3  (A INTEGER, B INTEGER, C INTEGER, D INTEGER ) PRIMARY INDEX (A) ;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),1, 0  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),2, 1  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),3, 2  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),4, 3  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),5, 4  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),6, 5  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),7, 6  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),8, 7  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),9, 8 FROM  SYS_CALENDAR.CALENDAR;

 

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),1, 1  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),2, 2  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),3, 3  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),4, 4  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),5, 5  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),6, 6  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),7, 7  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),8, 8 FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),9, 9  FROM  SYS_CALENDAR.CALENDAR;

 

— Your basic Window aggregate

— Decision 1: where B BETWEEN 70000 AND 70001, second level: QUALIFY D <AVG(D) OVER (PARTITION BY B)

— Decision 2: PARTITION BY B, C / PARTITION BY B in QUALIFY

— Decision 3: unlimited, i.e. ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

— Decision 4:  AVG()

— Decision 5: display A,B,C,D

sel A,B,C,D, AVG(D) OVER(PARTITION BY B, C ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )

from TABLE_A_S3

where B BETWEEN 70000 AND 70001

QUALIFY D <AVG(D) OVER (PARTITION BY B)

;

 

Our Reader Score
[Total: 1    Average: 5/5]
Teradata OLAP Functions written by Paul Timar on April 14, 2014 average rating 5/5 - 1 user ratings

LEAVE A REPLY

Please enter your comment!
Please enter your name here