Experienced Teradata users are familiar with the concept of a primary index. However, a new term has surfaced with the introduction of Teradata Release 15: Primary AMP Index.
This blog post will demonstrate a Primary AMP’s benefits and optimal usage for improving load and query performance.
Initially, let us examine Teradata’s approaches throughout the years in distributing data evenly among all AMPs.
The Teradata Primary Index
Initially, solely the Primary Index existed, establishing the table’s columns that generate a hash value, ultimately designating which AMP manages each row.
The hash value is utilized to sort the rows of primary index tables.
Utilizing the Primary Index and sorting by hash value enables the Optimizer to restrict the row search to the corresponding AMPs.
By sorting the rows using the hash value, the AMP can effortlessly locate them through a binary search within the data blocks.
Teradata can join two AMP-local tables without redistributing rows if they have the same primary index.
The Teradata NOPI Table
Column partitioning mandates using NOPI tables, which allow for creating tables without a primary index. This addition was made a few years ago.
The NOPI tables are distributed evenly and randomly across all AMPs.
To locate rows, one must search through all AMPs, as specific hash values are not assigned to them.
However, the story does not conclude here. Additionally, a hash value is required to locate the rows within the data blocks, making binary search unfeasible. Thus, Teradata is compelled to read all data blocks.
NOPI Tables are commonly employed in staging since bulk load utilities, such as Fastload, exclusively distribute rows uniformly across all AMPs using a round-robin method, eliminating the need to sort them according to hash values.
Before Teradata Release 15, column partitioning was exclusively available through this method.
Column partitioning’s sole benefit lies in limiting IOs to only the necessary columns, as a full table scan is invariably required.
To join a NOPI table, row redistribution is necessary.
The Teradata Primary AMP Index
The Primary AMP Index resulted from issues identified with NOPI tables.
Skewed tables are uncommon for NOPI tables, although they may occur when transferring data from a skewed table through an INSERT…SELECT command. This process inserts rows into the local AMP of the NOPI table.
To address this issue, a new syntax has been introduced to facilitate the redistribution of rows based on a hash value.
Since Teradata 15, columnar tables can now have a primary index, rendering NOPI tables less significant.
The Primary AMP Index was created to merge the benefits of the Primary Index, which offers singular AMP access, binary search, and AMP-local join, with the advantages of the NOPI table, which eliminates the need for sorting rows by hash value during loading.
How the Primary AMP Index Works
How are the Rows of the Primary AMP Index distributed?
As with the Primary Index, the AMP responsible for a row is determined by hashing.
How is the data of the Primary AMP Index stored on the AMPs?
The data is not sorted. This means that a binary search in the data blocks is not possible. The AMP must search all data blocks.
When can the Primary AMP Index be used?
We can only use the Primary AMP Index for tables partitioned by columns.
What is the advantage of the Primary AMP Index over a Primary Index?
Rows do not have to be sorted on the AMPs. Bulk loading is, therefore, faster, for example.
What is the advantage of the Primary AMP Index compared to a NOPI table?
AMP-local joins are possible. Even if the hash value is not stored with every row, the Optimizer knows the hashed columns and can use this information for AMP-local joins.
So you get the advantage of high-performance joins column elimination and no need to sort the rows during loading.
What should I use for Column Partitioned Tables? PI, NOPI, or Primary AMP Index?
I would start with Primary AMP Index Tables because they have advantages for loading and executing queries (Join performance).
But as always, this design decision can vary depending on the application.
Could you elaborate on the disadvantages of Primary Amp?
when it should NOT be used, except for some tactical queries maybe (because all the data blocks of the AMP must be searched)?
Very useful! thank you!