Teradata offers various options for designing Partitioned Primary Index Tables. The optimal design choice depends on the anticipated workload.
This article examines the impact of partitioning methods on the access path selected by the Teradata Optimizer, ultimately influencing system performance.
To begin, a test scenario is required, which consists of two identical tables. The sole distinction between the two tables is that one is partitioned while the other is not.
CREATE TABLE Schedule_PPI ( ScheduleId BIGINT, StartDate DATE FORMAt 'YYYY-MM-DD' ) PRIMARY INDEX (ScheduleId) PARTITION BY RANGE_N ( StartDate BETWEEN DATE '1900-01-01' AND DATE '2100-12-31' EACH INTERVAL '1' MONTH, NO RANGE);
CREATE TABLE Schedule_NPPI ( ScheduleId BIGINT, StartDate DATE FORMAt 'YYYY-MM-DD' ) PRIMARY INDEX (ScheduleId) ;
We load these tables with an identical set of test data:
INSERT INTO Schedule_PPI SELECT ROW_NUMBER() OVER (ORDER BY 1) AS ScheduleId, CALENDAR_DATE FROM ( SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR ) t01; COLLECT STATISTICS COLUMN(ScheduleId) ON Schedule_PPI; COLLECT STATISTICS COLUMN(StartDate) ON Schedule_PPI; COLLECT STATISTICS COLUMN(PARTITION) ON Schedule_PPI;
INSERT INTO Schedule_NPPI SELECT ROW_NUMBER() OVER (ORDER BY 1) AS ScheduleId, CALENDAR_DATE FROM ( SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR UNION ALL SELECT * FROM SYS_CALENDAR.CALENDAR ) t01; COLLECT STATISTICS COLUMN(ScheduleId) ON Schedule_NPPI; COLLECT STATISTICS COLUMN(StartDate) ON Schedule_NPPI;
Query Only Via The Partition Columns
Initially, a query will be performed on the PPI and NPPI tables utilizing the primary index within the WHERE condition. The partitioned table will be the first to be processed.
SELECT * FROM Schedule_PPI WHERE StartDate BETWEEN DATE '2019-10-31' AND DATE'2019-10-31';
Let’s examine the execution plan in detail.
Explain SELECT * FROM Schedule_PPI WHERE StartDate BETWEEN DATE '2019-10-31' AND DATE'2019-10-31'; 1) First, we lock DWHPRO.Schedule_PPI in TD_MAP1 for read on a reserved RowHash in a single partition to prevent global deadlock. 2) Next, we lock DWHPRO.Schedule_PPI in TD_MAP1 for read on a single partition. 3) We do an all-AMPs RETRIEVE step in TD_MAP1 from a single partition of DWHPRO.Schedule_PPI with a condition of ( "DWHPRO.Schedule_PPI.StartDate = DATE '2019-10-31'") with a residual condition of ("DWHPRO.Schedule_PPI.StartDate = DATE '2019-10-31'") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 10 rows (390 bytes). The estimated time for this step is 0.00 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.00 seconds.
An All-AMP retrieve step is required, but only a single partition is read ((all days of October 2019 are in the same partition).

We will now run the identical query on the NPPI table.
SELECT * FROM Schedule_NPPI WHERE StartDate BETWEEN DATE '2019-10-31' AND DATE'2019-10-31';
Here is the execution plan:
Explain SELECT * FROM Schedule_NPPI WHERE StartDate BETWEEN DATE '2019-10-31' AND DATE'2019-10-31'; 1) First, we lock DWHPRO.Schedule_NPPI in TD_MAP1 for read on a reserved RowHash to prevent global deadlock. 2) Next, we lock DWHPRO.Schedule_NPPI in TD_MAP1 for read. 3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.Schedule_NPPI by way of an all-rows scan with a condition of ("DWHPRO.Schedule_NPPI.StartDate = DATE '2019-10-31'") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 10 rows ( 390 bytes). The estimated time for this step is 0.15 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.15 seconds.
For a Teradata NPPI Table, All-AMP access is required, however, a complete table scan must be conducted.

In this scenario, the PPI table offers a performance advantage since its rows are locally situated in a partition, negating the need to search through all data blocks.
The Optimizer’s estimation with 0.00 seconds and 0.15 seconds expresses this exceptionally well.
Query Only Via The Primary Index
Let us examine the outcome of querying the primary index without referencing the partition columns.
We will rerun the query on the PPI table.
SELECT * FROM Schedule_PPI WHERE ScheduleId = 100;
The optimizer presents the execution plan.
Explain SELECT * FROM Schedule_PPI WHERE ScheduleId = 100; 1) First, we do a single-AMP RETRIEVE step from all partitions of DWHPRO.Schedule_PPI by way of the primary index "DWHPRO.Schedule_PPI.ScheduleId = 100" with no residual conditions into Spool 1 (one-amp), which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 1 row (39 bytes). The estimated time for this step is 0.02 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
With knowledge of the Primary Index, a single AMP Retrieve Step is performed. Nonetheless, Teradata must probe all partitions to determine the existence of a row with the desired primary index.

We will now execute the Primary Index query on the NPPI table.
SELECT * FROM Schedule_NPPI WHERE ScheduleId = 100;
The Explain Statement shows us the following execution plan:
Explain SELECT * FROM Schedule_NPPI WHERE ScheduleId = 100; 1) First, we do a single-AMP RETRIEVE step from DWHPRO.Schedule_NPPI by way of the primary index "DWHPRO.Schedule_NPPI.ScheduleId = 100" with no residual conditions into Spool 1 (one-amp), which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 1 row (39 bytes). The estimated time for this step is 0.00 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.00 seconds.
The desired row can be located using the rowhash on the AMP.

Improvement Of The Access Via Teradata Secondary Index
The Primary Index requires all partition columns to be included in order to be unique.
In the previous example, querying a Teradata PPI table solely with the Primary Index required checking every partition to locate the desired Primary Index value.
Preventing this issue is effortless by utilizing a Unique Secondary Index (USI) on the partitioned table’s primary index columns.
CREATE UNIQUE INDEX (ScheduleId) ON Schedule_PPI;
Let’s examine how this impacts the query’s execution plan.
Explain SELECT * FROM Schedule_PPI WHERE ScheduleId = 100; 1) First, we do a two-AMP RETRIEVE step in TD_MAP1 from DWHPRO.Schedule_PPI by way of unique index # 4 "DWHPRO.Schedule_PPI.ScheduleId = 100" with no residual conditions. The estimated time for this step is 0.01 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.01 seconds.
As a consequence, the subsequent benefits have been realized:
- No probing of the partitions is necessary
- A Rowhash lock is used instead of a table lock
How can I improve performance if the Primary Index lacks uniqueness?
A NUSI can be defined on the Primary Index Columns.
CREATE INDEX (ScheduleId) ON Schedule_PPI;
SELECT * FROM Schedule_PPI WHERE ScheduleId = 100;
Explain SELECT * FROM Schedule_PPI WHERE ScheduleId = 100; 1) First, we do a single-AMP RETRIEVE step from all partitions of DWHPRO.Schedule_PPI by way of the primary index "DWHPRO.Schedule_PPI.ScheduleId = 100" with no residual conditions into Spool 1 (one-amp), which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 1 row (39 bytes). The estimated time for this step is 0.02 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
The NUSI offers benefits similar to the Teradata USI when querying a Teradata PPI table solely through the primary index.
- No probing of Partitions needed
- Rowhash locking instead of table lock
- Multiload into a table with NUSI is possible
How Teradata Queries PPI/NPPI Tables?
How is a Teradata PPI table queried if only the PPI columns are used in the WHERE condition?
With an All-AMP retrieve step accessing relevant partitions only.
How is a Teradata NPPI table queried if the columns used in the WHERE condition are not indexed?
With a full table scan
How is a Teradata PPI table queried if only the Primary Index columns are used in the WHERE condition?
With a Single-AMP Retrieve step probing all partitions
How is a Teradata NPPI table queried if the Primary Index columns are used in the WHERE condition?
With a Single-AMP Primary Index, Direct Access
How can a Teradata PPI table query be improved if it only occurs via the primary index but not the partition columns?
Using a secondary index (USI, NUSI) on the primary index columns.
The index prevents a table-level lock and allows the use of a Rowhash Lock. It also eliminates the need for partition probing.
Additional articles on the Teradata Partitioned Primary Index Table are available:
Teradata Partitioning Strategies Anyone Would Be Proud Of(Opens in a new browser tab)
Teradata Partitioning Strategies Anyone Would Be Proud Of(Opens in a new browser tab)