Using the Teradata Queryband

0
324

teradata querybandTypically, reporting scenarios require to aggregate one ore several facts of a certain reporting date or a range of dates.

If the involved tables are historized with start date and end date, the frequently seen join scenario is that the driving table is product joined with a calendar table and restricted on the reporting date, such as in our example below:

SELECT cal.calendar_date as mydate, c.* FROM Client c INNER JOIN Sys_Calendar.CALENDAR cal ON cal.calendar_date between c.hist_start_date and c.hist_end_date WHERE mydate = DATE ‘2015-05-20’;

1) First, we lock a distinct “pseudo table” for read
on a RowHash to prevent global deadlock for c.
2) Next, we lock c for read.
3) We do a single-AMP RETRIEVE step from SYS_CALENDAR.CALDATES in
view CALENDAR by way of the unique primary index
“SYS_CALENDAR.CALDATES in view CALENDAR.cdate = DATE ‘2015-05-20′”
with no residual conditions into Spool 2 (all_amps) (compressed
columns allowed), which is duplicated on all AMPs.  The size of
Spool 2 is estimated with high confidence to be 390 rows (6,630
bytes).  The estimated time for this step is 0.01 seconds.
4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to c by way of an
all-rows scan with a condition of (“(c.hist_end_date >=
DATE ‘2015-05-20’) AND (c.hist_start_date <= DATE
‘2015-05-20’)”).  Spool 2 and c are joined  using a product join ,
with a join condition of (“(cdate >=
c.hist_start_date) AND (cdate <= c.hist_end_date)”).
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs.  The size of Spool 1 is estimated with no confidence
to be 1,641,096 rows (1,296,465,840 bytes).  The estimated time
for this step is 0.78 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
> The contents of Spool 1 are sent back to the user as the result of
statement 1.  The total estimated time is 0.78 seconds.
The resource usage (CPUTime and Disk IOs) in our example query is the following:

TotalIoCount:  71.588,00
AMPCPUTime: 175,29

Using the Teradata Queryband

A less known method is the following which is making use of query banding: Instead of joining with a calendar table, the reporting date is stored in the session’s query band, and therefore a join can be avoided.

SET QUERY_BAND = ‘mydate=2015-05-20;’ FOR SESSION;

SELECT CAST(getquerybandvalue(0,’mydate’) AS DATE) AS mydate, c.* FROM Client c WHERE CAST(getquerybandvalue(0,’mydate’) AS DATE FORMAT ‘YYYY-MM-DD’)  BETWEEN    c.hist_start_date and c.hist_end_date AND  mydate = DATE ‘2015-05-20’;

1) First, we lock a distinct “pseudo table” for read
on a RowHash to prevent global deadlock for c.
2) Next, we lock c for read.
3) We do an all-AMPs RETRIEVE step from c by way of an
all-rows scan with a condition of (“((SYSLIB.getquerybandvalue (0,
‘mydate’)(DATE, FORMAT ‘yyyy-mm-dd’))= DATE ‘2015-05-20’) AND
((c.hist_start_date <= (SYSLIB.getquerybandvalue (0,
‘mydate’)(DATE, FORMAT ‘YYYY-MM-DD’))) AND
(c.hist_end_date >= (SYSLIB.getquerybandvalue (0,
‘mydate’)(DATE, FORMAT ‘YYYY-MM-DD’))))”) into Spool 1
(group_amps), which is built locally on the AMPs.  The size of
Spool 1 is estimated with no confidence to be 1,476,987 rows (
1,166,819,730 bytes).  The estimated time for this step is 0.93
seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.  The total estimated time is 0.93 seconds.

TotalIoCount:  57.418,00
AMPCPUTime: 581,99

As the query which is using query banding doesn’t require a join step,  20% less disk IOs are needed but CPU usage increased by 330% .

While the query band solution seems to be the second choice, keep in mind that the relation between consumed CPU time and disk IOs will be shifted with the number of different days which are product joined.

The number of compares required for the product join increase exponential with the number of calendar days being joined.

You could, for example, pack more than one date into the query band (see below) , which would be increasing the relative advantage of the second solution, by a relative reduction in disk IOs.

Furthermore, your general system saturation (IO bound or CPU bound) has to be considered as well.

SET QUERY_BAND = ‘mydate=2015-05-20; ‘mydate=2015-05-21;’ FOR SESSION;

 

 

Our Reader Score
[Total: 3    Average: 4.3/5]
Using the Teradata Queryband written by Roland Wenzlofsky on May 27, 2015 average rating 4.3/5 - 3 user ratings

LEAVE A REPLY

Please enter your comment!
Please enter your name here