teradata tuning Guest Post

Sometimes, Teradata Tuning is a task that demands a high level of creativity.

The case mentioned below demonstrates how query rewriting can turn a non-performing query into a highly performing one, even though the solution does not hit your eye immediately.

Consider the following scenario. We have two tables, a big one, and a small one:

CREATE MULTISET TABLE DB1.smalltable
  (
  CLIENT_ID INTEGER,
  CALL_START_DATE DATE FORMAT 'YYYY-MM-DD',
  HARDWARE_ID CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC)
  PRIMARY INDEX ( CLIENT_ID );
CREATE MULTISET TABLE DB2.bigtable
  (
  CLIENT_ID INTEGER,
  START_DATE DATE FORMAT 'YYYY-MM-DD',
  END_DATE DATE FORMAT 'YYYY-MM-DD',
  HARDWARE_ID CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC)
  PRIMARY INDEX ( CLIENT_ID )
  PARTITION BY CASE_N(END_DATE = DATE '3999-12-31', END_DATE = DATE '3999-12-30', NO CASE OR UNKNOWN);

Both tables have the same Primary Index, but “bigtable” is partitioned by the historization end date, while the “smalltable” is a non-partitioned table.

At first glance, I got the impression I had to optimize a DELETE statement, which required about 35 minutes to finish.

DELETE a
  FROM
  DB1.smalltable a,
  DB2.bigtable c
  where
  a.call_start_date between c.START_DATE and c.END_DATE and
  a.CLIENT_ID=c.CLIENT_ID and
  a.hardware_id=c.hardware_id;

By real-time monitoring, I could identify the cause of the problem: Skewing.

A detailed look at the statistics information revealed a severely skewed CLIENT_ID on table “smalltable”:

COLLECT STATISTICS
  COLUMN ( CLIENT_ID )
  ON DB1. smalltable
  VALUES
  (
  /** SummaryInfo **/
  /* Data Type and Length: 'I:4' */
  /* TimeStamp */ TIMESTAMP '2015-03-30 10:55:22-00:00',
  /* Version */ 5,
  /* OriginalVersion */ 5,
  /* DBSVersion */ '14.10.04.09',
  /* UsageType */ 'D',
  /* ComplexStatInfo */ 'ComplexStatInfo',
  /* NumOfBiasedValues */ 54,
  /* NumOfEHIntervals */ 200,
  /* NumOfHistoryRecords */ 0,
  /* SamplePercent */ 0.00,
  /* NumOfNulls */ 0,
  /* NumOfAllNulls */ 0,
  /* NumOfPartialNullVals */ 0,
  /* PartialNullHMF */ 0,
  /* AvgAmpRPV */ 0.000000,
  /* MinVal */ 1250347405,
  /* MaxVal */ 1770591815,
  /* ModeVal */ 1266076019,
  /* HighModeFreq */ 1316929,
  /* NumOfDistinctVals */ 61302,
  /* NumOfRows */ 1744994,
  /* CPUUsage */ 0.000000,
  /* IOUsage */ 0.000000,
  /* Reserved */ 0,
  /* Reserved */ 0,
  /* Reserved */ 0.000000,
  /* Reserved */ 0.000000,
  /* Reserved */ '',
  /** Biased: Value, Frequency **/
  /* 1 */ 1111111101, 1788,
  /* 2 */ 1111111102, 580,
  /* 3 */ 1111111103, 1826,
  /* 4 */ 1111111104, 810,
  /* 5 */ 1111111105, 1795,
  /* 6 */ 1111111106, 618,
  /* 7 */ 1111111107, 531,
  /* 8 */ 1111111108, 516,
 
/* 9 */ 1111111111, 1316929, -- HIGHLY SKEWED VALUE !!!
/* 10 */ 1111111109, 543, /* 11 */ 1111111110, 629, /* 12 */ 1111111112, 1111, /* 13 */ 1111111113, 1079, /** Interval: MaxVal, ModeVal, ModeFreq, LowFreq, OtherVals, OtherRows **/

One might assume the chosen execution plan is great. A sliding window merge join without any data distribution steps. Nevertheless, the skewed value on column CLIENT_ID lead to a terrible runtime behavior:

1) First, we lock a distinct DB1.”pseudo table” for write on a RowHash to prevent global deadlock for DB1.a.

2) Next, we lock a distinct DB2.”pseudo table” for read on a RowHash to prevent global deadlock for DB2.c.

3) We lock DB1.a for write, and we lock DB2.c for read.

4) We do an all-AMPs JOIN step from DB2.c by way of a RowHash match scan with no residual conditions, which is joined to DB1.a by way of a RowHash match scan with no residual conditions. DB2.c and DB1.a are joined using a sliding-window merge join, with a join condition of ( “(DB1.a.CLIENT_ID = DB2.c.CLIENT_ID) AND ((DB1.a.HARDWARE_ID = DB2.c.HARDWARE_ID) AND ((DB1.a.CALL_START_DATE >= DB2.c.START_DATE ) AND (DB1.a.CALL_START_DATE <= DB2.c.END_DATE )))”). The result goes into Spool 1 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 1 by the hash code of ( DB1.a.ROWID) the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 1,749,441 rows (31,489,938 bytes). The estimated time for this step is 2.20 seconds.

5) We do an all-AMPs MERGE DELETE to DB1.a from Spool 1 (Last Use) via the row id. The size is estimated with low confidence to be 1,749,441 rows. The estimated time for this step is 16 minutes and 34 seconds.

6) We spoil the parser’s dictionary cache for the table.

7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

My primary goal, therefore, was to get the optimizer to adopt another Execution Plan. After a couple of trials, I came up with the following solution:

delete a
  from
  DB1.smalltable a,
  DB2.bigtable c
  where
  a.call_start_date between c.START_DATE and c.END_DATE and
  coalesce(a.CLIENT_ID,a.CLIENT_ID)=c.CLIENT_ID and
  a.hardware_id=c.hardware_id and
  exists(select 1 from DB1.smalltable x where x.CLIENT_ID=c.CLIENT_ID);
 Luckily, the execution plan changed, shortening the query run time to 25 second: 
1) First, we lock a distinct DB1.”pseudo table” for write on a RowHash to prevent global deadlock for DB1.a.

2) Next, we lock a distinct DB2.”pseudo table” for read on a RowHash to prevent global deadlock for DB2.c.

3) We lock DB1.a for write, and we lock DB2.c for read.

4) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from DB1.a by way of an all-rows scan with a condition of (“NOT (DB1.a.HARDWARE_ID IS NULL)”) into Spool 2 (all_amps), which is redistributed by the hash code of ( DB1.a.HARDWARE_ID, (CASE WHEN (NOT (DB1.a.CLIENT_ID IS NULL )) THEN (DB1.a.CLIENT_ID) ELSE (DB1.a.CLIENT_ID) END )(INTEGER)) to all AMPs. The size of Spool 2 is estimated with high confidence to be 1,744,994 rows ( 82,014,718 bytes). The estimated time for this step is 0.25 seconds. 2) We do an all-AMPs JOIN step from DB2.c by way of an all-rows scan with no residual conditions, which is joined to DB1.x by way of an all-rows scan with no residual conditions. DB2.c and DB1.x are joined using a sliding-window inclusion merge join, with a join condition of (“(DB1.x.CLIENT_ID = DB2.c.CLIENT_ID) AND (NOT (DB2.c.CLIENT_ID IS NULL ))”). The result goes into Spool 3 (all_amps), which is redistributed by the hash code of (DB2.c.CLIENT_ID, DB2.c.HARDWARE_ID) to all AMPs. The size of Spool 3 is estimated with low confidence to be 12,265,080 rows (551,928,600 bytes). The estimated time for this step is 7.92 seconds.

5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to Spool 3 (Last Use) by way of an all-rows scan. Spool 2 and Spool 3 are joined using a single partition hash join, with a join condition of (“((( CASE WHEN (NOT (CLIENT_ID IS NULL )) THEN (CLIENT_ID) ELSE (CLIENT_ID) END ))= CLIENT_ID) AND ((CALL_START_DATE <= END_DATE ) AND ((CALL_START_DATE >= START_DATE ) AND (HARDWARE_ID = HARDWARE_ID )))”). The result goes into Spool 1 (all_amps), which is redistributed by the hash code of ( DB1.a.ROWID) to all AMPs. Then we do a SORT to order Spool 1 by row hash and the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with low confidence to be 1,744,994 rows (31,409,892 bytes). The estimated time for this step is 18.09 seconds.

6) We do an all-AMPs MERGE DELETE to DB1.a from Spool 1 (Last Use) via the row id. The size is estimated with low confidence to be 1,744,994 rows. The estimated time for this step

Conclusion:

At times, you have to open your mind for alternatives of all kind to solve performance issues by query rewriting.

By rewriting this query, I could force the optimizer to use a sliding window inclusion merge join instead of a regular sliding window merge join. Combined with the EXISTS subquery this highly improved the execution plan,  leading to a speedup factor of  70!

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.
Our Reader Score
[Total: 8    Average: 4.9/5]
A Teradata Tuning Triumph written by Martin Murtenthaler on March 30, 2015 average rating 4.9/5 - 8 user ratings

9 COMMENTS

  1. I think the optimizer does a poor job in this case or the join implementation in TD is not ok. I came across this situation a couple of times even without much skew. There is something wrong with the sliding window merge join.

  2. Hello Nitin,

    thanks for your comment.
    And regarding coalesce(a.id,a.id) it is right, what you are mentioned.
    The optimizier is hindered to to directly a row hash match scan, because he does not know that this is equal a.id.

    Have a good day
    liebe Grüße
    Martin

  3. Hi Nitin.
    I am actually working on an article about join techniques on Teradata. It will also cover the advanced strategies for PPI tables like sliding window, product join with DPE, Rowkey based merge join etc. I will create info graphics showing which options the optimizer has in which situation.

  4. @Prathap: You are right about usage of coalesce that it returns the first not null value. My confusion was how keeping the same column in coalesce will help which in this case is CLIENT_ID.

    @Martin & Roland: Thanks Guys for the clear explanation. So here is the exception to basic performance tuning methodology where we say that we should avoid functions on joining columns. In this case by actually applying COALESCE we are getting better explain plan. Is it correct ?

    @Roland: I am looking for some TD post which explains the join strategies we see in Explain Plan. I know about basic strategies MERGE, HASH, NESTED, EXCLUSION however I am looking for info on sliding window , inclusion etc joining keywords we see in Explain Plan. Please share link if you have already covered these topics in any of your posts.

    Cheers
    Nitin

  5. With the Exists-Subquery the join is “splitted” into a rough one, which preselects the records of the big table. The fine one can then take advantage of the dramatically reduced amount of the remaining big table records. Redistribution is possible and by this also a hash-join (without duplicating the small table to all amps).
    The coalesce-construction was necessary, because without it the optimizer would do the fine join by a merge join, which lasts so long.

  6. Hi Martin

    “coalesce(a.CLIENT_ID,a.CLIENT_ID)”

    Is this typo error ?
    Or did I miss something here ?
    How is this coalesce helpful here ? Please explain.
    Regards
    Nitin

    • Hi Martin, Thanks for your post. I’m new to TD. I would like to knew more performance issues from you. Please keep posting..

      Hi Nitin,

      As per my knowledge, coalesce function will return the first non-null value from the stream of values. Here in this query there may be a chance of null Client_ID values which may be a time consuming factor while you compare with big table in join condition.
      So using coalesce function every time you join with non-null values instead of nulls.
      Ex:
      coalesce ( ‘ram’,’prathap’, ‘ ‘,’nitin’,’martin’) – o/p : ram
      coalesce ( ‘ ‘,’prathap’, ‘ ‘,’nitin’,’martin’) – o/p : prathap
      coalesce ( ‘ ‘,’ ‘, ‘ ‘,’nitin’,’martin’) – o/p : nitin

      Martin — Please correct me if i am wrong.

    • Hello Nitin, hi Prathap,

      coalesce(a.client_id,a.client_id) is completely the same as only a.client_id, but the optimizer does not know this. 🙂
      By this he can not do directly a merge join. He would first have to retrieve, redistribute and sort by coalesce(a.client_id,a.client_id). By this the Optimizer thinks this is to much work for me and chooses a different plan.

      BR Martin

      • Hi Martin, as far as I see you remove from the optimizer the possibility to execute a direct sliding window merge join by applying the COALESCE function on the join condition. The Parsing Engine can’t hash if there is a function applied on a (join) column (apart from a few exceptions like UPPERCASE,LOWERCASE etc.) the direct merge join is not possible anymore. As there is one skewed value on column CLIENT_ID, the AMP holding this value would have to do a lot of work during the sliding window merge join, which would be leading to an unbalanced query workload.

        The EXISTS subquery gives the optimizer the information that you only are interested in CLIENT_ID’s existing in “smalltable”. Based on this information, the optimizer chooses a sliding window inclusion merge join which works in a way that each ROWHASH in the left table is looked up in the right table. As soon as the ROWHASH is found once, the next ROWHASH value is looked up. Basically this means that the skewed value is only accessed one time and the join is not skewing.

        In summary, there are two interacting optimization techniques working together:

        Martin took away the possibility from the optimizer to apply a join technique which led to skewing

        Rewriting the simple INNER JOIN into a EXISTS subquery allowed the optimizer to use the less skew sensitive sliding window inclusion merge join

LEAVE A REPLY

Please enter your comment!
Please enter your name here