A great Performance Trick for the Teradata PPI

1
901
teradata partitioned primary index

teradata partitioned primary index

 Get a Grip on clumpy Row Distribution – the Teradata PPI Table

For this trick to function properly, the partition column(s) must not be part of the Teradata Primary Index.

At some point in time, we all experienced this problem with table inserts: Although the table rows are distributed evenly across all AMPs, the merge step into the table seems to take forever.

One very likely reason is that we are confronted with the scenario of a SET table and many identical NUPI values. The AMPs are forced to do a duplicate row check for each inserted row.  This process can take an enormous amount of time and waste a lot of CPU cycles as each new row is compared in full against all the existing rows having the same Primary Index value.

While this may be solved by switching over to a MULTISET design in the case of nonpartitioned tables, we have another great opportunity to address this problem with Teradata partitioned tables (PPI tables).

Assume we have the following table design like below:

CREATE TABLE MultiLevel_Example
(

  the_code INTEGER NOT NULL,
  the_date DATE  NOT NULL,

  the_value DECIMAL(18,0)
) UNIQUE PRIMARY INDEX (the_code)
PARTITION BY (
RANGE_N(the_date BETWEEN DATE ‘2015-01-01‘ AND DATE ‘2015-12-31’ EACH INTERVAL ‘1’ MONTH), RANGE_N(the_code BETWEEN 1 AND 5 EACH 1)
);

As you can see, the partition columns are not part of the Primary Index – this is the prerequisite for the trick to work!

Below is the schematic draw of the internal presentation of the table:

teradata ppi

The column labeled “C” is the combined partition, “L1” the first level of partitioning and “L2” the second tier. The column labeled “H” represents the ROWHASH of “The_Code.”

“Whenever we create a PPI table which does not contain all partitioning columns in the Primary Index definition, equal NUPI values are spread across different partitions, reducing the clumps and therefore improving performance as the number of rows to be compared for each duplicate row check becomes significantly smaller.”

Nevertheless, be aware that having the partition columns not in the Primary Index comes as well with a penalty and you have to evaluate advantages and disadvantages:

In the case of a primary index access, all partitions have to be checked to find the searched Primary Index value. Furthermore, merge joins may be executed slower and aggregations are more costly as they have to take place across partitions.

Let us take a short look why this trick will not work for a Teradata PPI table in case that all partition columns are contained in the Primary Index definition:

CREATE TABLE MultiLevel_Example

the_code
INTEGER NOT NULL,
the_date DATE  NOT NULL,

  the_value DECIMAL(18,0)

) UNIQUE PRIMARY INDEX (the_code,the_date)
PARTITION BY (
RANGE_N(the_date BETWEEN DATE ‘2015-01-01‘ AND DATE ‘2015-12-31’ EACH INTERVAL ‘1’ MONTH), RANGE_N(the_code BETWEEN 1 AND 5 EACH 1)
);

Take a look at below illustration. As The ROWHASH is calculated over all partition columns, it is by design impossible that the same ROWHASH ends up in different partitions (take your time to prove this):

teradata ppi

That’s all. I hope you enjoyed it. Feel free to ask any questions and let me know if something is unclear.

 

Our Reader Score
[Total: 12    Average: 4.3/5]
A great Performance Trick for the Teradata PPI written by Roland Wenzlofsky on March 6, 2015 average rating 4.3/5 - 12 user ratings

1 COMMENT

  1. While performing a huge java transaction with insertion queries in batches .Getting AMP lock table has been overflowed SQL Exception .how to solve it ? any settings need to be changed in dbscontrol ?

LEAVE A REPLY

Please enter your comment!
Please enter your name here