This article presents a case study from my previous projects. Specifically, it showcases an implementation that generates rows for all days between a start date and an end date based on a primary key. To illustrate the issue, I have simplified the query. The following example serves as a demonstration:

Primary KeyStart DateEnd Date
12022-09-122022-09-15
Primary KeyStart DateEnd DateCalculated Date
12022-09-122022-09-152022-09-12
12022-09-122022-09-152022-09-13
12022-09-122022-09-152022-09-14
12022-09-122022-09-152022-09-15

Allow me to acquaint you with the implementation I optimized. Initially, the problem was resolved using a recursive query, which performed favorably. However, as the data expanded significantly, the query’s runtime escalated, substantially increasing.

SET QUERY_BAND = ‘DWHPRO=RECURSION;’ FOR SESSION;

SELECT PK, START_DATE, END_DATE, CALC_DATE
FROM (
WITH RECURSIVE RecTable (PK,START_DATE,END_DATE,CALC_DATE) AS
(

SELECT t01.PK,t01.START_DATE, DATE AS END_DATE, START_DATE AS CALC_DATE
FROM PERFTEST t01
WHERE START_DATE BETWEEN DATE’1900-01-01′ AND DATE+1

UNION ALL

SELECT PK,START_DATE,END_DATE,CALC_DATE + INTERVAL ‘1’ DAY AS CALC_DATE
FROM RecTable
WHERE (CALC_DATE < END_DATE )
)
SELECT * FROM RecTable
) A
ORDER BY PK, CALC_DATE;

QueryBandTotalIOCountAMPCPUTimeSpoolUsageStartTimeFirstRespTime
RECURSION700180812,0038859,8234842068172813.09.2022 19:11:11.83000013.09.2022 19:17:28.820000

Below you can see the optimized query:

SET QUERY_BAND = ‘DWH=EXPAND;’ FOR SESSION;

SELECT PK, START_DT, DATE AS END_DT, BEGIN(CC) AS CALC_DATE
FROM PERFTEST T01
WHERE START_DT BETWEEN DATE’1900-01-01′ AND DATE
EXPAND ON PERIOD(START_DT,DATE+1) AS CC BY INTERVAL ‘1’ DAY
ORDER BY PK,CALC_DATE

QueryBandTotalIOCountAMPCPUTimeSpoolUsageStartTimeFirstRespTime
EXPAND7084187,0011141,8234842068172813.09.2022 19:19:23.12000013.09.2022 19:19:41.360000

The improved query necessitates 99% fewer IO operations than the initial query and 71% fewer CPU seconds. Spool utilization has remained constant.

The case study highlighted Teradata’s potential for significant performance improvements through query optimization. Altering the SQL itself can often yield greater efficiency without using Oracle’s “hints.”

  • Would the CPU figures improve a bit by precalculating the EXPAND in a volatile, CROSS JOIN, and filter out unneeded values?

    The idea is to avoid calling EXPAND for each returned record.

    As mentioned in a previous article, EXPAND does not work as a sub query, the volatile hence. But there was also a comment there how to fool the system and bypass this limitation.

  • Maybe that could be possible with XML_AGG as well?
    We usually (cross) join a calendar table to such a query which contains each date and the end of day or end of month date records.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >