What is a Teradata Query Band?
The DBQL tables provide us with all the key figures we need to optimize queries. Because a Teradata system typically runs many thousands of queries per day, it is not easy to locate individual queries in the DQBL tables. It is, therefore, good practice to use the Teradata query band feature to localize queries. Suppose this feature is activated (for performance tuning, preferably for the whole session, but it can also be activated for single transactions). In that case, any key/value pair is stored as text in DBC.DBQLOGTBL in column “QueryBand”. The general syntax for setting a query band is shown below:
How can I change an existing Teradata Query Band?
It is also possible to change a query band afterward with the UPDATE option or add key/value pairs to an existing query band. It is not possible to delete a key/value pair from an existing query band in Teradata. If this is needed, the entire query band must be redefined (view DBC.SessionInfoV shows the query bands of all active sessions):
Reading the key/value pairs from the “Queryband” column can be quickly done with the GetQueryBandValue function.The first parameter specifies the query band to be returned if more than one exists (the first = 0, Transaction = 1, Session = 2):
How Query Banding in Teradata can be used for Tuning
Let us assume we have just created an alternative version of an SQL statement that seems to be performing better. Right as we show it to a co-worker, it is not so fast anymore regarding run time. So are we wrong about our improvement? Here is an easy way to document the performance before and after optimization. We can wrap our alternative query versions in Teradata’s queryband to generate different resource usage tracks. The figures can be read out and used for documentation and presentation purposes quickly. We have to follow these steps below:
- We have to ensure that our Teradata SQL Assistant session does not do any other requests once we have set the query band.
- We need to set the Teradata query band for the not optimized version of our SQL, run the SQL and turn it off immediately as shown below:
SET QUERY_BAND = ‘QUERY1=VS1;’ FOR SESSION;
Select * FROM CustomerView;
SET QUERY_BAND=NONE FOR SESSION;
- Next, we repeat the above two steps for our optimized query, using a different query band name, e.g. ‘QUERY1=VS2;’
- Finally, we can compare the result
SELECT QueryBand, NumResultRows, NumSteps, TotalIOCount, AMPCPUTime, ParserCPUTime, NumOfActiveAMPs, MaxCPUAmpNumber, MinAmpIO,MAxAmPIO, MaxIOAmpNumber, SpoolUsage
WHERE TRIM(QueryBand) LIKE ‘%QUERY1=%’
and QueryText LIKE ‘%SELECT%’
It might take a few minutes until Teradata writes our activity from the cache to the DBC table. So we should not worry if an immediate execution of step 4 returns no rows. We should plan our result presentation around this delay in order not to make our target audience impatient, or we can use the statement FLUSH QUERY LOGGING WITH ALL to flush the query log cache immediately.
Depending on where our tuning efforts’ focus and expectations lie, one or the other measure is more important. We have hard proof of our success at hand now.
What to consider when creating a Query Band in Teradata
It makes sense in a production environment to store application names, job names, script versions, and other relevant variables (e.g., loading date) in the query band and make them available in DBC.DBQLOGTBL for later performance analysis.
When designing the query bands, we should use a consistent structure to make the analysis possible across the Teradata system’s entire workload.
Below is the link to the official Teradata documentation:
A Performance Tuning Example with the Teradata Query Band
Typically, reporting scenarios require to aggregate several facts of a certain reporting date or a range of dates.
If the tables are historized with a start date and end date, the frequently seen join scenario is that the driving table is product joined with a calendar table and restricted on the reporting date, such as in our example below:
SELECT cal.calendar_date as mydate, c.* FROM Client c INNER JOIN Sys_Calendar.CALENDAR cal ON cal.calendar_date BETWEEN c.hist_start_date AND c.hist_end_date WHERE mydate = DATE ‘2020-10-20’;
1) First, we lock a distinct “pseudo table” for read on a RowHash to prevent global deadlock for c. 2) Next, we lock c for read. 3) We do a single-AMP RETRIEVE step from SYS_CALENDAR.CALDATES in view CALENDAR by way of the unique primary index “SYS_CALENDAR.CALDATES in view CALENDAR.cdate = DATE ‘2020-10-20′” with no residual conditions into Spool 2 (all_amps) (compressed columns allowed), which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 390 rows (6,630 bytes). The estimated time for this step is 0.01 seconds. 4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to c by way of an all-rows scan with a condition of (“(c.hist_end_date >= DATE ‘2020-10-20’) AND (c.hist_start_date <= DATE ‘2020-10-20’)”). Spool 2 and c are joined using a product join , with a join condition of (“(cdate >= c.hist_start_date) AND (cdate <= c.hist_end_date)”). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1,641,096 rows (1,296,465,840 bytes). The estimated time for this step is 0.78 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. > The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.78 seconds.
The resource usage (CPUTime and Disk I/Os) in our example query is the following:
Replacing the Calendar Table with a Query Band
A less known method is the following, which uses query banding: Instead of joining with a calendar table, the reporting date is stored in the session’s query band, and therefore a join can be avoided.
SET QUERY_BAND = ‘mydate=2015-10-20;’ FOR SESSION; SELECT CAST(getquerybandvalue(0,’mydate’) AS DATE) AS mydate, c.* FROM Client c WHERE CAST(getquerybandvalue(0,’mydate’) AS DATE FORMAT ‘YYYY-MM-DD’) BETWEEN c.hist_start_date and c.hist_end_date AND mydate = DATE ‘2020-10-20’;
1) First, we lock a distinct “pseudo table” for read on a RowHash to prevent global deadlock for c. 2) Next, we lock c for read. 3) We do an all-AMPs RETRIEVE step from c by way of an all-rows scan with a condition of (“((SYSLIB.getquerybandvalue (0, ‘mydate’)(DATE, FORMAT ‘yyyy-mm-dd’))= DATE ‘2020-10-20’) AND ((c.hist_start_date <= (SYSLIB.getquerybandvalue (0, ‘mydate’)(DATE, FORMAT ‘YYYY-MM-DD’))) AND (c.hist_end_date >= (SYSLIB.getquerybandvalue (0, ‘mydate’)(DATE, FORMAT ‘YYYY-MM-DD’))))”) into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1,476,987 rows ( 1,166,819,730 bytes). The estimated time for this step is 0.93 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.93 seconds.
As the query using query banding doesn’t require a join step, 20% fewer disk IOs are needed, but CPU usage increased by 330%. Which version is better in your environment depends on which kind of performance issues you face (CPU or I/O bound system).
The number of compares required for the product join increase exponential with the number of calendar days being joined. If we pack more than one date into the query band, it increases the second solution’s relative advantage by a relative reduction in disk I/Os.