How To Speed Up Your PPI Tables With A Secondary Index

Roland Wenzlofsky

April 23, 2023

minutes reading time


Teradata offers various options for designing Partitioned Primary Index Tables. The optimal design choice depends on the anticipated workload.

This article examines the impact of partitioning methods on the access path selected by the Teradata Optimizer, ultimately influencing system performance.

To begin, a test scenario is required, which consists of two identical tables. The sole distinction between the two tables is that one is partitioned while the other is not.

CREATE TABLE Schedule_PPI
(
 ScheduleId BIGINT,
 StartDate DATE FORMAt 'YYYY-MM-DD'
) PRIMARY INDEX (ScheduleId)
PARTITION BY RANGE_N (
StartDate BETWEEN DATE '1900-01-01' AND DATE '2100-12-31' 
EACH INTERVAL '1' MONTH, NO RANGE);
CREATE TABLE Schedule_NPPI
(
	ScheduleId	BIGINT,
	StartDate	DATE FORMAt 'YYYY-MM-DD'
) PRIMARY INDEX (ScheduleId)
;

We load these tables with an identical set of test data:

INSERT INTO Schedule_PPI
SELECT 
 ROW_NUMBER() OVER (ORDER BY 1) AS ScheduleId,
 CALENDAR_DATE
FROM 
(
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
) t01;

COLLECT STATISTICS COLUMN(ScheduleId) ON Schedule_PPI; 
COLLECT STATISTICS COLUMN(StartDate) ON Schedule_PPI; 
COLLECT STATISTICS COLUMN(PARTITION) ON Schedule_PPI;
INSERT INTO Schedule_NPPI
SELECT 
 ROW_NUMBER() OVER (ORDER BY 1) AS ScheduleId,
 CALENDAR_DATE
FROM 
(
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
 UNION ALL
 SELECT * FROM SYS_CALENDAR.CALENDAR
) t01;

COLLECT STATISTICS COLUMN(ScheduleId) ON Schedule_NPPI; 
COLLECT STATISTICS COLUMN(StartDate) ON Schedule_NPPI;

Query Only Via The Partition Columns

Initially, a query will be performed on the PPI and NPPI tables utilizing the primary index within the WHERE condition. The partitioned table will be the first to be processed.

SELECT *
FROM Schedule_PPI
WHERE StartDate BETWEEN DATE '2019-10-31' AND DATE'2019-10-31';

Let’s examine the execution plan in detail.

Explain 
SELECT *
FROM Schedule_PPI
WHERE StartDate BETWEEN DATE '2019-10-31' AND DATE'2019-10-31';

  1) First, we lock DWHPRO.Schedule_PPI in TD_MAP1 for read on a
     reserved RowHash in a single partition to prevent global deadlock.
  2) Next, we lock DWHPRO.Schedule_PPI in TD_MAP1 for read on a single
     partition.
  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from a single partition
     of DWHPRO.Schedule_PPI with a condition of (
     "DWHPRO.Schedule_PPI.StartDate = DATE '2019-10-31'") with a
     residual condition of ("DWHPRO.Schedule_PPI.StartDate = DATE
     '2019-10-31'") into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with high
     confidence to be 10 rows (390 bytes).  The estimated time for this
     step is 0.00 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.00 seconds.

An All-AMP retrieve step is required, but only a single partition is read ((all days of October 2019 are in the same partition).

Teradata PPI
All-AMP – One Partition Access

We will now run the identical query on the NPPI table.

SELECT *
FROM Schedule_NPPI
WHERE StartDate BETWEEN DATE '2019-10-31' AND DATE'2019-10-31';

Here is the execution plan:

Explain SELECT *
FROM Schedule_NPPI
WHERE StartDate  BETWEEN DATE '2019-10-31' AND DATE'2019-10-31';

  1) First, we lock DWHPRO.Schedule_NPPI in TD_MAP1 for read on a
     reserved RowHash to prevent global deadlock.
  2) Next, we lock DWHPRO.Schedule_NPPI in TD_MAP1 for read.
  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from
     DWHPRO.Schedule_NPPI by way of an all-rows scan with a condition
     of ("DWHPRO.Schedule_NPPI.StartDate = DATE '2019-10-31'") into
     Spool 1 (group_amps), which is built locally on the AMPs.  The
     size of Spool 1 is estimated with high confidence to be 10 rows (
     390 bytes).  The estimated time for this step is 0.15 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.15 seconds.

For a Teradata NPPI Table, All-AMP access is required, however, a complete table scan must be conducted.

Teradata Partitioned Primary Index
All-AMP Full Table Scan

In this scenario, the PPI table offers a performance advantage since its rows are locally situated in a partition, negating the need to search through all data blocks.

The Optimizer’s estimation with 0.00 seconds and 0.15 seconds expresses this exceptionally well.

Query Only Via The Primary Index

Let us examine the outcome of querying the primary index without referencing the partition columns.

We will rerun the query on the PPI table.

SELECT *
FROM Schedule_PPI
WHERE ScheduleId = 100;

The optimizer presents the execution plan.

Explain SELECT *
FROM Schedule_PPI
WHERE ScheduleId = 100;

  1) First, we do a single-AMP RETRIEVE step from all partitions of
     DWHPRO.Schedule_PPI by way of the primary index
     "DWHPRO.Schedule_PPI.ScheduleId = 100" with no residual conditions
     into Spool 1 (one-amp), which is built locally on that AMP.  The
     size of Spool 1 is estimated with high confidence to be 1 row (39
     bytes).  The estimated time for this step is 0.02 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.02 seconds.

With knowledge of the Primary Index, a single AMP Retrieve Step is performed. Nonetheless, Teradata must probe all partitions to determine the existence of a row with the desired primary index.

Teradata Partitioning
One AMP – All Partitions

We will now execute the Primary Index query on the NPPI table.

SELECT *
FROM Schedule_NPPI
WHERE ScheduleId = 100;

The Explain Statement shows us the following execution plan:

Explain SELECT *
FROM Schedule_NPPI
WHERE ScheduleId = 100;

  1) First, we do a single-AMP RETRIEVE step from DWHPRO.Schedule_NPPI
     by way of the primary index "DWHPRO.Schedule_NPPI.ScheduleId = 100"
     with no residual conditions into Spool 1 (one-amp), which is built
     locally on that AMP.  The size of Spool 1 is estimated with high
     confidence to be 1 row (39 bytes).  The estimated time for this
     step is 0.00 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.00 seconds.

The desired row can be located using the rowhash on the AMP.

Teradata Primary Index
One-AMP Primary Index Access

Improvement Of The Access Via Teradata Secondary Index

The Primary Index requires all partition columns to be included in order to be unique.

In the previous example, querying a Teradata PPI table solely with the Primary Index required checking every partition to locate the desired Primary Index value.

Preventing this issue is effortless by utilizing a Unique Secondary Index (USI) on the partitioned table’s primary index columns.

CREATE UNIQUE INDEX (ScheduleId) ON Schedule_PPI;

Let’s examine how this impacts the query’s execution plan.

Explain 
SELECT *
FROM Schedule_PPI
WHERE ScheduleId = 100;

  1) First, we do a two-AMP RETRIEVE step in TD_MAP1 from
     DWHPRO.Schedule_PPI by way of unique index # 4
     "DWHPRO.Schedule_PPI.ScheduleId = 100" with no residual conditions.
     The estimated time for this step is 0.01 seconds.
  -> The row is sent directly back to the user as the result of
     statement 1.  The total estimated time is 0.01 seconds.

As a consequence, the subsequent benefits have been realized:

  • No probing of the partitions is necessary
  • A Rowhash lock is used instead of a table lock

How can I improve performance if the Primary Index lacks uniqueness?

A NUSI can be defined on the Primary Index Columns.

CREATE INDEX (ScheduleId) ON Schedule_PPI;
SELECT *
FROM Schedule_PPI
WHERE ScheduleId = 100;
Explain 
SELECT *
FROM Schedule_PPI
WHERE ScheduleId = 100;

  1) First, we do a single-AMP RETRIEVE step from all partitions of
     DWHPRO.Schedule_PPI by way of the primary index
     "DWHPRO.Schedule_PPI.ScheduleId = 100" with no residual conditions
     into Spool 1 (one-amp), which is built locally on that AMP.  The
     size of Spool 1 is estimated with high confidence to be 1 row (39
     bytes).  The estimated time for this step is 0.02 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.02 seconds.

The NUSI offers benefits similar to the Teradata USI when querying a Teradata PPI table solely through the primary index.

  • No probing of Partitions needed
  • Rowhash locking instead of table lock
  • Multiload into a table with NUSI is possible

How Teradata Queries PPI/NPPI Tables?

How is a Teradata PPI table queried if only the PPI columns are used in the WHERE condition?

With an All-AMP retrieve step accessing relevant partitions only.

How is a Teradata NPPI table queried if the columns used in the WHERE condition are not indexed?

With a full table scan

How is a Teradata PPI table queried if only the Primary Index columns are used in the WHERE condition?

With a Single-AMP Retrieve step probing all partitions

How is a Teradata NPPI table queried if the Primary Index columns are used in the WHERE condition?

With a Single-AMP Primary Index, Direct Access

How can a Teradata PPI table query be improved if it only occurs via the primary index but not the partition columns?

Using a secondary index (USI, NUSI) on the primary index columns.
The index prevents a table-level lock and allows the use of a Rowhash Lock. It also eliminates the need for partition probing.

Additional articles on the Teradata Partitioned Primary Index Table are available:

Teradata Partitioning Strategies Anyone Would Be Proud Of(Opens in a new browser tab)

Teradata Partitioning Strategies Anyone Would Be Proud Of(Opens in a new browser tab)

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>