Teradata OLAP Functions: A Systematic Approach To Understanding Window Aggregates

Are you familiar with the Teradata OLAP Functions in your data warehouse applications? Have you encountered a request to utilize them? You may have experienced difficulty comprehending their functionality or applicability to business issues they aim to address.

Have you ever questioned why your company’s official training material, which costs a significant amount of money, requires multiple printed pages to cover a topic you still don’t understand after completing it?

If you have encountered any of these situations, I can present you with a methodical approach to comprehending and handling these groups. Every window aggregate function we have encountered involves only five design choices.

Here they are:

Over what population of data (the “where”-decision)?
For which aggregation levels (the “partition”-decision)?
Over what lag and what lead (the “preceding-following” decision)?
What kind of aggregation with what intent (the “function” decision)?
Over which attribute(s) (the “grouping” decision)?

The initial decision involves the selection of the data population for analysis. It is important to consider whether all table rows are pertinent and which data layers the report will concentrate on. This is a fundamental decision that must be made for any SQL query. In window aggregations, the where-decision is a two-tiered structure, with the second level comprising a conditional qualify option, depending on the data set decision. Conversely, the conditional option in simple aggregation is represented by “having”.

Next, determine the appropriate levels of aggregation by utilizing the “partition by” option within the “over()” portion.

Determine the number of rows and columns for your desired result. The preceding and following settings in the over() function determine whether you create a moving average, cumulative result, or a comparison of values.

The aggregate you select determines the outcome. It is important to note that the function you opt for is a technical solution to the semantics of the business problem, which may not correspond exactly to the implementation you ultimately choose.

Next, determine which attributes to exhibit in the result set. While your ability to create logical result sets is restricted, you can still opt to display or conceal columns, whether they have been transformed or not.

Each decision is interdependent, but there are varying levels of flexibility.

To validate my method and experiment with different decisions, generate and complete the table below with fictitious data. Then, modify a single decision each time and execute the subsequent window aggregation.

CREATE SET TABLE TABLE_A_S3  (A INTEGER, B INTEGER, C INTEGER, D INTEGER ) PRIMARY INDEX (A) ;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),1, 0  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),2, 1  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),3, 2  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),4, 3  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),5, 4  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),6, 5  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),7, 6  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),8, 7  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),9, 8 FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),1, 1  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),2, 2  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),3, 3  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),4, 4  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),5, 5  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),6, 6  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),7, 7  FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),8, 8 FROM  SYS_CALENDAR.CALENDAR;

INSERT INTO TABLE_A_S3 SELECT RANDOM(1,10000000), ROW_NUMBER() OVER ( ORDER BY 1),9, 9  FROM  SYS_CALENDAR.CALENDAR;

— Your basic Window aggregate

— Decision 1: where B BETWEEN 70000 AND 70001, second level: QUALIFY D <AVG(D) OVER (PARTITION BY B)

— Decision 2: PARTITION BY B, C / PARTITION BY B in QUALIFY

— Decision 3: unlimited, i.e. ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

— Decision 4: AVG()

— Decision 5: display A,B,C,D

sel A,B,C,D, AVG(D) OVER(PARTITION BY B, C ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )
from TABLE_A_S3
where B BETWEEN 70000 AND 70001
QUALIFY D <AVG(D) OVER (PARTITION BY B)

;

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

Leave a Comment Cancel reply