Optimizing Teradata Queries: A Case Study on Generating Rows Between Start and End Dates

Roland Wenzlofsky

April 22, 2023

minutes reading time

This article presents a case study from my previous projects. Specifically, it showcases an implementation that generates rows for all days between a start date and an end date based on a primary key. To illustrate the issue, I have simplified the query. The following example serves as a demonstration:

Primary Key	Start Date	End Date
1	2022-09-12	2022-09-15

Primary Key	Start Date	End Date	Calculated Date
1	2022-09-12	2022-09-15	2022-09-12
1	2022-09-12	2022-09-15	2022-09-13
1	2022-09-12	2022-09-15	2022-09-14
1	2022-09-12	2022-09-15	2022-09-15

Allow me to acquaint you with the implementation I optimized. Initially, the problem was resolved using a recursive query, which performed favorably. However, as the data expanded significantly, the query’s runtime escalated, substantially increasing.

SET QUERY_BAND = ‘DWHPRO=RECURSION;’ FOR SESSION;
SELECT PK, START_DATE, END_DATE, CALC_DATE
FROM (
WITH RECURSIVE RecTable (PK,START_DATE,END_DATE,CALC_DATE) AS
(
SELECT t01.PK,t01.START_DATE, DATE AS END_DATE, START_DATE AS CALC_DATE
FROM PERFTEST t01
WHERE START_DATE BETWEEN DATE’1900-01-01′ AND DATE+1
UNION ALL
SELECT PK,START_DATE,END_DATE,CALC_DATE + INTERVAL ‘1’ DAY AS CALC_DATE
FROM RecTable
WHERE (CALC_DATE < END_DATE )
)
SELECT * FROM RecTable
) A
ORDER BY PK, CALC_DATE;

QueryBand	TotalIOCount	AMPCPUTime	SpoolUsage	StartTime	FirstRespTime
RECURSION	700180812,00	38859,82	348420681728	13.09.2022 19:11:11.830000	13.09.2022 19:17:28.820000

Below you can see the optimized query:

SET QUERY_BAND = ‘DWH=EXPAND;’ FOR SESSION;

SELECT PK, START_DT, DATE AS END_DT, BEGIN(CC) AS CALC_DATE
FROM PERFTEST T01
WHERE START_DT BETWEEN DATE’1900-01-01′ AND DATE
EXPAND ON PERIOD(START_DT,DATE+1) AS CC BY INTERVAL ‘1’ DAY
ORDER BY PK,CALC_DATE

QueryBand	TotalIOCount	AMPCPUTime	SpoolUsage	StartTime	FirstRespTime
EXPAND	7084187,00	11141,82	348420681728	13.09.2022 19:19:23.120000	13.09.2022 19:19:41.360000

The improved query necessitates 99% fewer IO operations than the initial query and 71% fewer CPU seconds. Spool utilization has remained constant.

The case study highlighted Teradata’s potential for significant performance improvements through query optimization. Altering the SQL itself can often yield greater efficiency without using Oracle’s “hints.”

Take a brief survey and receive the book for free Get instant access

Share0

Tweet0

Share0

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

Would the CPU figures improve a bit by precalculating the EXPAND in a volatile, CROSS JOIN, and filter out unneeded values?

The idea is to avoid calling EXPAND for each returned record.

As mentioned in a previous article, EXPAND does not work as a sub query, the volatile hence. But there was also a comment there how to fool the system and bypass this limitation.

Maybe that could be possible with XML_AGG as well?
We usually (cross) join a calendar table to such a query which contains each date and the end of day or end of month date records.

Good staff, thank you .

Optimizing Teradata Queries: A Case Study on Generating Rows Between Start and End Dates

Roland Wenzlofsky

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes