This article presents a case study from my previous projects. Specifically, it showcases an implementation that generates rows for all days between a start date and an end date based on a primary key. To illustrate the issue, I have simplified the query. The following example serves as a demonstration:
Primary Key | Start Date | End Date |
1 | 2022-09-12 | 2022-09-15 |
Primary Key | Start Date | End Date | Calculated Date |
1 | 2022-09-12 | 2022-09-15 | 2022-09-12 |
1 | 2022-09-12 | 2022-09-15 | 2022-09-13 |
1 | 2022-09-12 | 2022-09-15 | 2022-09-14 |
1 | 2022-09-12 | 2022-09-15 | 2022-09-15 |
Allow me to acquaint you with the implementation I optimized. Initially, the problem was resolved using a recursive query, which performed favorably. However, as the data expanded significantly, the query’s runtime escalated, substantially increasing.
SET QUERY_BAND = ‘DWHPRO=RECURSION;’ FOR SESSION;
SELECT PK, START_DATE, END_DATE, CALC_DATE
FROM (
WITH RECURSIVE RecTable (PK,START_DATE,END_DATE,CALC_DATE) AS
(SELECT t01.PK,t01.START_DATE, DATE AS END_DATE, START_DATE AS CALC_DATE
FROM PERFTEST t01
WHERE START_DATE BETWEEN DATE’1900-01-01′ AND DATE+1UNION ALL
SELECT PK,START_DATE,END_DATE,CALC_DATE + INTERVAL ‘1’ DAY AS CALC_DATE
FROM RecTable
WHERE (CALC_DATE < END_DATE )
)
SELECT * FROM RecTable
) A
ORDER BY PK, CALC_DATE;
QueryBand | TotalIOCount | AMPCPUTime | SpoolUsage | StartTime | FirstRespTime |
RECURSION | 700180812,00 | 38859,82 | 348420681728 | 13.09.2022 19:11:11.830000 | 13.09.2022 19:17:28.820000 |
Below you can see the optimized query:
SET QUERY_BAND = ‘DWH=EXPAND;’ FOR SESSION;
SELECT PK, START_DT, DATE AS END_DT, BEGIN(CC) AS CALC_DATE
FROM PERFTEST T01
WHERE START_DT BETWEEN DATE’1900-01-01′ AND DATE
EXPAND ON PERIOD(START_DT,DATE+1) AS CC BY INTERVAL ‘1’ DAY
ORDER BY PK,CALC_DATE
QueryBand | TotalIOCount | AMPCPUTime | SpoolUsage | StartTime | FirstRespTime |
EXPAND | 7084187,00 | 11141,82 | 348420681728 | 13.09.2022 19:19:23.120000 | 13.09.2022 19:19:41.360000 |
The improved query necessitates 99% fewer IO operations than the initial query and 71% fewer CPU seconds. Spool utilization has remained constant.
The case study highlighted Teradata’s potential for significant performance improvements through query optimization. Altering the SQL itself can often yield greater efficiency without using Oracle’s “hints.”
Would the CPU figures improve a bit by precalculating the EXPAND in a volatile, CROSS JOIN, and filter out unneeded values?
The idea is to avoid calling EXPAND for each returned record.
As mentioned in a previous article, EXPAND does not work as a sub query, the volatile hence. But there was also a comment there how to fool the system and bypass this limitation.
Maybe that could be possible with XML_AGG as well?
We usually (cross) join a calendar table to such a query which contains each date and the end of day or end of month date records.
Good staff, thank you .