Teradata Tuning with Query Rewriting
This article explores how to enhance the efficiency of SQL statements through rewriting techniques. This is based on a recent project scenario.
This SQL statement had a lengthy runtime and required further analysis:
,CASE WHEN t02.PK IS NULL THEN 'NEW' ELSE 'CHANGED' END AS Type_Indicator
t01.PK = t02.PK AND
AND PK2.TheDate <> DATE'9999-12-31'
PK1.TheDate = DATE'9999-12-31'
GROUP BY 1,2,3...,59,60;
The table we choose from has the following format:
CREATE TABLE TheTable
PK INTEGER NOT NULL,
TheDate DATE FORMAT 'YYYY-MM-DD',
) PRIMARY INDEX(PK);
We are choosing from a table that stores one entry for each item and reference date (with columns “PK” and “TheDate”). The table consists of roughly 150 million rows and numerous columns.
The objective of the query is to retrieve the latest records for each object, identifiable by the date ‘9999-12-31’, and indicate whether prior records exist. To distinguish between objects with and without history, we utilize the “Type_Indicator” column, which classifies them as either “OLD” or “NEW”.
Upon analyzing the SQL statement, it yields the desired result. However, the query requires 2 hours to complete.
Our initial step is to examine each component of the query and pinpoint any flaws in the implementation strategy.
During a self-join, Teradata repeatedly retrieves each recent row whenever it matches a historical row. To eliminate these duplicates, a GROUP BY statement is necessary. Despite the join being AMP-local due to a matching Primary Index, a costly redistribution step is still required for the GROUP BY statement. Additionally, the spool must be sorted by all 60 columns to enable grouping.
The expenses surge as more rows and columns are grouped together.
After identifying the query’s weaknesses, let us explore methods to enhance it.
To enhance the process, we can efficiently detect the presence of at least one historical row without including all historical rows in the current query.
, CASE WHEN t02.PK IS NULL THEN 'NEW' ELSE 'CHANGED' END AS Type_Indicator
SELECT DISTINCT PK
WHERE TheDate <> DATE'9999-12-31'
t01.PK = t02.PK AND
PK1.TheDate = DATE'9999-12-31';
The revised query optimizes by transferring the comparison logic to a subquery aggregating the primary key column while leaving everything else unaltered.
The new query has a noteworthy advantage:
Utilizing the DISTINCT function and an AMP-local join completes the aggregation process without the need for redistribution, resulting in a more optimized query. This contrasts the previous inefficient query, which necessitated a costly redistribution step and sorting of 60 columns instead of just one.
Effective performance tuning requires measurable indicators of success. The following are recommended:
Disk IOs old: 1,878,357
Disk IOs new: 163,532
We were able to cut disk IOs by almost 91%!
The situation is the same for the necessary spool space.
Spool Space old: approx. 16.3 GB
Spool Space new: approx. 3.3 GB
We decreased spool consumption by around 80%!
Our tuning example demonstrated how minor query modifications can significantly impact resource consumption.