Teradata Tuning Success – The Best ever

0
533

Teradata Tuning SuccessAs a Teradata SQL tuner, most of the times we have to deal with skew. In my experience, up to 90% of SQL performance issues are related to skew. But this article is not about skew. Sometimes a slight change is enough to produce miracles.

In this article, we I show the possible impact of wrongly chosen character sets on SQL performance. A topic which is often overlooked or ignored.

How it all began

Recently, while checking the QryLog table, my attention was drawn to a DELETE statement. It was consuming a huge amount of CPU seconds and causing a lot of IOs. In fact, each day this one statement caused more IOs than all the other workload together.

Even worse, it looked like the resource consumption was growing rapidly (on average, each day 50 million rows were added) , as you can see in below chart:

 

DELETE

 

The mentioned DELETE statement executes against a huge table “TheTable”.  “TheTable” contains almost 3 billion rows. A subquery is used to identify the rows to be deleted (please note that table “TheVolatileTable”, being used for filtering, contains exactly 1 row):

DELETE  FROM TheDatabase.TheTable
WHERE (COL1, COL2, COL3, COL4) IN ( SELECT COL1, COL2, COL3, COL4 FROM TheVolatileTable );

Below you can see the table definitions of both involved tables:

CREATE MULTISET TABLE TheDatabase.TheTable (
PK BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MINVALUE -999999999999999999 MAXVALUE 999999999999999999 NO CYCLE),
Col1 VARCHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
Col2 VARCHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
Col3 DATE FORMAT ‘YYYY-MM-DD’,
Col4 CHAR(12) CHARACTER SET LATIN NOT CASESPECIFIC,
RefDat DATE FORMAT ‘YYYY-MM-DD’
)  PRIMARY INDEX ( PK )
PARTITION BY RANGE_N(RefDat  BETWEEN ‘2010-01-01’ AND DATE + INTERVAL ‘365’ DAY  EACH INTERVAL ‘1’ DAY , NO RANGE, UNKNOWN);

SELECT CAST(COUNT(*) AS BIGINT) FROM TheDatabase.TheTable;
Count(*)
2.928.780.765 –> almost 3 billion rows!

This is the table used for filtering the rows to be deleted (it is populated with exactly 1 row):

CREATE MULTISET VOLATILE TABLE TheVolatileTable AS
(
SELECT
‘CODE’ AS Col1 ,CASE WHEN Col5 = ‘X’ THEN ‘Y’ ELSE ‘N’ END AS Col2 ,
CAST(substr(TheDate,1,4) || ‘-‘ || substr(TheDate,5,2) || ‘-‘ || substr(TheDate,7,2) AS DATE FORMAT ‘YYYY-MM-DD’) AS Col3
,Col5 AS Col4
FROM DummyTable GROUP BY 1,2,3,4
) WITH DATA ON COMMIT PRESERVE ROWS ;

SELECT COUNT(*) FROM  TheVolatileTable;
Count(*) 1 –> exactly 1 row

The Initial Situation

In the original setup, the volatile table is created with this syntax:
CREATE TABLE AS (<SUBQUERY>);

This is a comfortable method for creating and populating tables in one step.  Still, the author of the query has overseen that it’s important to take care about data types and character sets. On the affected Teradata system, character columns are created by default with character set UNICODE . Unfortunately, the table “TheTable” defines these character columns as character set LATIN, causing a mismatch.

SHOW TABLE TheVolatileTable

CREATE MULTISET VOLATILE TABLE TheVolatileTable
(
Col1 VARCHAR(3) CHARACTER SET UNICODE NOT CASESPECIFIC,
Col2 VARCHAR(1) CHARACTER SET UNICODE NOT CASESPECIFIC,
Col3 DATE FORMAT ‘YYYY-MM-DD’,
Col4 CHAR(12) CHARACTER SET UNICODE NOT CASESPECIFIC)
NO PRIMARY INDEX ON COMMIT PRESERVE ROWS;

The resulting execution plan for the bad performing DELETE statement looks like this:

EXPLAIN DELETE   FROM TheDatabase.TheTable
WHERE (COL1, COL2, COL3, COL4) IN
(
SELECT COL1, COL2, COL3, COL4 FROM TheVolatileTable
);

1) First, we lock a distinct TheDatabase.”pseudo table” for write
on a RowHash to prevent global deadlock for
TheDatabase.TheTable.
2) Next, we lock TheDatabase.TheTable for write.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
TheDatabase.TheTable by way of an all-rows scan
with no residual conditions into Spool 2 (all_amps)
(compressed columns allowed) fanned out into 50 hash join
partitions, which is built locally on the AMPs.  The size of
Spool 2 is estimated with high confidence to be 2,928,780,765
rows (123,008,792,130 bytes).  The estimated time for this
step is 32.77 seconds.

2) We do an all-AMPs RETRIEVE step from
TheVolatileTable by way of an all-rows scan with no
residual conditions into Spool 4 (all_amps), which is
redistributed by the hash code of (
TheVolatileTable.COL4,
TheVolatileTable.COL3,
TheVolatileTable.COL2,
TheVolatileTable.COL1) to all AMPs.  Then
we do a SORT to order Spool 4 by the sort key in spool field1
eliminating duplicate rows.  The size of Spool 4 is estimated
with high confidence to be 2 rows (186 bytes).  The estimated
time for this step is 0.02 seconds.

4) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of
an all-rows scan into Spool 3 (all_amps) (compressed columns
allowed) fanned out into 50 hash join partitions, which is
duplicated on all AMPs.  The size of Spool 3 is estimated with
high confidence to be 780 rows (72,540 bytes).

5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan.  Spool 2 and Spool 3 are joined using a inclusion
hash join of 50 partitions, with a join condition of (
“((TRANSLATE((COL1 )USING LATIN_TO_UNICODE))=
COL1) AND (((TRANSLATE((COL2 )USING
LATIN_TO_UNICODE))= COL2) AND ((COL3 = COL3) AND
((TRANSLATE((COL4 )USING LATIN_TO_UNICODE))= COL4 )))”).
The result goes into Spool 1 (all_amps), which is redistributed by
the rowkey of (TheDatabase.TheTable.ROWID) to all
AMPs.  Then we do a SORT to partition Spool 1 by rowkey and the
sort key in spool field1 eliminating duplicate rows.  The size of
Spool 1 is estimated with low confidence to be 12,253 rows (
220,554 bytes).  The estimated time for this step is 0.50 seconds.

6) We do an all-AMPs MERGE DELETE to
TheDatabase.TheTable from Spool 1 (Last Use) via the
row id.  The size is estimated with low confidence to be 12,253
rows.  The estimated time for this step is 1.87 seconds.

7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.

-> No rows are returned to the user as the result of statement 1.

There are several issues with this execution plan. First of all, the join columns of the huge table “TheTable” have to be converted from LATIN to UNICODE. This has to be done for each row: 3 billion times!

This is an expensive operation, which is consuming a lot of CPU time.

But that is not all: The huge table is spooled into 50 hash partitions in preparation for the consecutive join. A lot of resources are needed to spool such a huge table containing billions of rows. Unfortunately, this is the prefered option for the Optimizer, because it has to deal with the character set conversion.

Below you can see the amount of IOs and CPU seconds being used:

 

TotalIOCount AMPCPUTime SpoolUsage
3.585.081.549,00 71.510,38 1.397.760,00

The Tuning Approach

The tuning of this query was simple, but very effective. By manually creating the volatile table, I was able to choose character set LATIN for the character columns, matching the character sets of both tables:

CREATE MULTISET VOLATILE TABLE TheVolatileTable
(
Col1 VARCHAR(3) CHARACTER SET LATIN  NOT CASESPECIFIC,
Col2 VARCHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
Col3 DATE FORMAT ‘YYYY-MM-DD’,
Col4 CHAR(12) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX (Col1, Col2, Col3, Col4) ON COMMIT PRESERVE ROWS;

Below you can see the improved execution plan:

EXPLAIN DELETE  FROM TheDatabase.TheTable
WHERE (COL1, COL2, COL3, COL4) IN
(
SELECT COL1, COL2, COL3, COL4 FROM TheVolatileTable
);

1) First, we lock a distinct TheDatabase.”pseudo table” for write
on a RowHash to prevent global deadlock for
TheDatabase.TheTable.

2) Next, we lock TheDatabase.TheTable for write.

3) We do an all-AMPs RETRIEVE step from TheVolatileTable by
way of an all-rows scan with no residual conditions into Spool 3
(all_amps), which is built locally on the AMPs.  Then we do a SORT
to order Spool 3 by the sort key in spool field1 (
TheVolatileTable.Col1,
TheVolatileTable.Col2,
TheVolatileTable.Col3,
TheVolatileTable.Col4) eliminating duplicate rows.
The size of Spool 3 is estimated with high confidence to be 2 rows
(124 bytes).  The estimated time for this step is 0.02 seconds.

4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 2 (all_amps) (compressed columns
allowed), which is duplicated on all AMPs.  The size of Spool 2 is
estimated with high confidence to be 780 rows (48,360 bytes).

5) We do an all-AMPs JOIN step from TheDatabase.TheTable
by way of an all-rows scan with no residual conditions, which is
joined to Spool 2 (Last Use) by way of an all-rows scan.
TheDatabase.TheTable and Spool 2 are joined using a
inclusion dynamic hash join, with a join condition of (
“(TheDatabase.TheTable.Col1 =
Col1) AND ((TheDatabase.TheTable.Col2
= Col2) AND ((TheDatabase.TheTable.Col3 =
Col3) AND (TheDatabase.TheTable.Col4
=Col4 )))”).  The result goes into Spool 1 (all_amps), which
is built locally on the AMPs.  Then we do a SORT to partition
Spool 1 by rowkey and the sort key in spool field1 eliminating
duplicate rows.  The size of Spool 1 is estimated with low
confidence to be 12,253 rows (220,554 bytes).  The estimated time
for this step is 13.06 seconds.

6) We do an all-AMPs MERGE DELETE to
TheDatabase.TheTable from Spool 1 (Last Use) via the
row id.  The size is estimated with low confidence to be 12,253
rows.  The estimated time for this step is 1.87 seconds.

7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.

-> No rows are returned to the user as the result of statement 1.

In the improved execution plan, the Teradata Optimizer is doing a dynamic hash join. Dynamic hash joins don’t spool to rehash the join columns, but calculate the row hash “on the fly”, row by row. No spooling of the 3 billion rows is required!

This small change caused that the resource consumption is reduced immensely:

IO Count CPU Time SpoolUsage
Tuned 1.335.246,00 2.916,23 599.040,00
Original 3.585.081.549,00 71.510,38 1.397.760,00
 Reduced to: 0,04% 4,08% 42,86%

 

These are impressive numbers: 99,96% less IOs and about 96% less CPU seconds. Even spool usage has been halved!

This tuning success shows how it is possible to achieve huge improvements with very little effort. Always identify your worst workload at first and try to improve it. Tuning your worst queries should always be one of your major goals.

Our Reader Score
[Total: 18    Average: 4.6/5]
Teradata Tuning Success – The Best ever written by Roland Wenzlofsky on September 18, 2015 average rating 4.6/5 - 18 user ratings

LEAVE A REPLY

Please enter your comment!
Please enter your name here