Have you met Teradata OLAP Functions in your data warehouse applications? Have you ever been asked to use them? Maybe you were struggling with understanding how they work or how they relate to the business problems they shall solve.
Alternatively, have you wondered why it takes official training material your company is paying dearly for dozens of printed pages to approach the topic while you do not see the principle behind it after having gone through it all?
If you have made any of these experiences, I would like to offer you a systematic approach to understanding and working with these aggregates. Any window aggregate function we came across manifests only five design decisions you have to take.
Here they are:
- Over what population of data (the “where”-decision)?
- For which aggregation levels (the “partition”-decision)?
- Over what lag and what lead (the “preceding-following” decision)?
- What kind of aggregation with what intent (the “function” decision)?
- Over which attribute(s) (the “grouping” decision)?
The first decision concerns the data population you wish to work on. Are all rows of a table relevant? What data layers does your report focus on? Note that this is the ordinary where-decision you must make for any SQL you write. In window aggregations, the where-decision is a two-storey building with the second floor being a conditional qualify option, given your data set decision. Having is the equivalent conditional option in simple aggregation.
Second, you have to decide on the levels of aggregation. Use the partition by option inside the over() part to make this decision.
Third, decide how many or over how many rows back and forth you would like to go to meet your desired result. The right preceding-following settings inside the over() part determine whether you create a moving average, cumulate results, or a training comparison of values.
Fourth, the aggregate you take co-determines what you get as a result. Note that the function you choose is a technical solution to the business problem’s semantics that does not necessarily sound like the implementation you decide on.
Fifth, you must decide which attributes to display with the result set. Although your options are limited here concerning creating result sets that make sense, you can still choose to show or hide columns, both transformed or untransformed.
None of the above decisions is independent of the other, but there are degrees of freedom.
To verify my approach and test your decision variations, you can create and fill the following table with dummy data and run variations of the below window aggregate, just altering one decision at a time.
CREATE SET TABLE TABLE_A_S3 (A INTEGER, B INTEGER, C INTEGER, D INTEGER ) PRIMARY INDEX (A) ;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),1, 0 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),2, 1 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),3, 2 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),4, 3 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),5, 4 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),6, 5 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),7, 6 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),8, 7 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),9, 8 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),1, 1 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),2, 2 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),3, 3 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),4, 4 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),5, 5 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),6, 6 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),7, 7 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),8, 8 FROM SYS_CALENDAR.CALENDAR;
INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),9, 9 FROM SYS_CALENDAR.CALENDAR;
— Your basic Window aggregate
— Decision 1: where B BETWEEN 70000 AND 70001, second level: QUALIFY D <AVG(D) OVER (PARTITION BY B)
— Decision 2: PARTITION BY B, C / PARTITION BY B in QUALIFY
— Decision 3: unlimited, i.e. ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
— Decision 4: AVG()
— Decision 5: display A,B,C,D
sel A,B,C,D, AVG(D) OVER(PARTITION BY B, C ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )
from TABLE_A_S3
where B BETWEEN 70000 AND 70001
QUALIFY D <AVG(D) OVER (PARTITION BY B)
;