This article will show a case study from my past projects. It is an implementation in which, from a row per primary key that contains a start date and an end date, rows are to be generated for all days in between. I have simplified the query for demonstration purposes, but the problem is the same. Here is an example:
Primary Key | Start Date | End Date |
1 | 2022-09-12 | 2022-09-15 |
Primary Key | Start Date | End Date | Calculated Date |
1 | 2022-09-12 | 2022-09-15 | 2022-09-12 |
1 | 2022-09-12 | 2022-09-15 | 2022-09-13 |
1 | 2022-09-12 | 2022-09-15 | 2022-09-14 |
1 | 2022-09-12 | 2022-09-15 | 2022-09-15 |
First, let me introduce you to the implementation I had to optimize. The problem was solved with a recursive query and worked well at the beginning. Over time, however, the data grew enormously, which eventually led to the fact that the runtime of the query increased immensely:
SET QUERY_BAND = ‘DWHPRO=RECURSION;’ FOR SESSION;
SELECT PK, START_DATE, END_DATE, CALC_DATE
FROM (
WITH RECURSIVE RecTable (PK,START_DATE,END_DATE,CALC_DATE) AS
(SELECT t01.PK,t01.START_DATE, DATE AS END_DATE, START_DATE AS CALC_DATE
FROM PERFTEST t01
WHERE START_DATE BETWEEN DATE’1900-01-01′ AND DATE+1UNION ALL
SELECT PK,START_DATE,END_DATE,CALC_DATE + INTERVAL ‘1’ DAY AS CALC_DATE
FROM RecTable
WHERE (CALC_DATE < END_DATE )
)
SELECT * FROM RecTable
) A
ORDER BY PK, CALC_DATE;
QueryBand | TotalIOCount | AMPCPUTime | SpoolUsage | StartTime | FirstRespTime |
RECURSION | 700180812,00 | 38859,82 | 348420681728 | 13.09.2022 19:11:11.830000 | 13.09.2022 19:17:28.820000 |
Below you can see the optimized query:
SET QUERY_BAND = ‘DWH=EXPAND;’ FOR SESSION;
SELECT PK, START_DT, DATE AS END_DT, BEGIN(CC) AS CALC_DATE
FROM PERFTEST T01
WHERE START_DT BETWEEN DATE’1900-01-01′ AND DATE
EXPAND ON PERIOD(START_DT,DATE+1) AS CC BY INTERVAL ‘1’ DAY
ORDER BY PK,CALC_DATE
QueryBand | TotalIOCount | AMPCPUTime | SpoolUsage | StartTime | FirstRespTime |
EXPAND | 7084187,00 | 11141,82 | 348420681728 | 13.09.2022 19:19:23.120000 | 13.09.2022 19:19:41.360000 |
The optimized query requires 99% fewer IOs than the original query and 71% fewer CPU seconds. Spool usage has remained the same.
The case study showed how we could often achieve huge performance benefits in Teradata by rewriting queries. Often the same goal can be achieved much more efficiently by changing the SQL and without having “hints” available as they exist in Oracle.
Would the CPU figures improve a bit by precalculating the EXPAND in a volatile, CROSS JOIN, and filter out unneeded values?
The idea is to avoid calling EXPAND for each returned record.
As mentioned in a previous article, EXPAND does not work as a sub query, the volatile hence. But there was also a comment there how to fool the system and bypass this limitation.
Maybe that could be possible with XML_AGG as well?
We usually (cross) join a calendar table to such a query which contains each date and the end of day or end of month date records.
Good staff, thank you .