Anyone who has been dealing with Teradata for a long time knows what a primary index is. But at some point, a new term crept in with Teradata Release 15: Primary AMP Index.
In this blog post, I will show you what it is, and when a Primary AMP is best used to improve the performance of your loads and queries.
Let’s first look at the strategies that Teradata has developed over the years to distribute data equally across all AMPs.
The Teradata Primary Index
At the very beginning, only the Primary Index was available. The Primary Index defines the columns of a table from which a hash value is calculated. This hash value finally determines which AMP is responsible for which rows.
The rows of the tables are sorted by this hash value for primary index tables.
The Primary Index together with the sorting by hash value allows the Optimizer to limit the search for rows to the AMPs which contain the rows.
Since the rows are sorted by the hash value, the rows can be easily found by the AMP using a binary search in the data blocks.
Joins of two tables with the same primary index can be performed AMP-local without redistributing rows.
The Teradata NOPI Table
Some time a few years ago the possibility to create tables without primary index was added. NOPI tables were mandatory if column partitioning was used.
The rows of NOPI tables are randomly but evenly distributed across all AMPs.
Since there is no hash value assigned to specific AMPs, finding rows is only possible by searching for them among all AMPs.
But this is not the end of the story. Also, the finding of the rows within the data blocks cannot be done by binary search without a hash value. This means that all data blocks must be read.
NOPI Tables are mainly used in staging because the bulk load utilities (fastload etc.) do not have to sort the rows by the hash value but only distribute them evenly over all AMPs in a round-robin procedure.
In addition, before Teradata Release 15 it was the only way to use column partitioning.
The only advantage of column partitioning was the restriction of the IOs to the required columns because a full table scan was always necessary.
Joining a NOPI table always requires the redistribution of the rows.
The Teradata Primary AMP Index
The Primary AMP Index is a result of problems recognized by the use of NOPI tables.
NOPI tables are typically not skewed, but it can happen when an INSERT…SELECT of a skewed table into a NOPI table is done. In this case, the rows are inserted into the NOPI table AMP-local.
In order to solve this problem, the possibility of redistributing the rows according to a hash value has been created (using a new syntax).
In the meantime (since Teradata 15), columnar tables can also have a primary index, and the significance of NOPI tables has decreased.
The Primary AMP Index was introduced to combine some advantages of the Primary Index (single AMP access, binary search, AMP-local join) with the advantages of the NOPI table (no sorting of the rows by hash value required when loading).
How the Primary AMP Index works
How are the Rows of the Primary AMP Index distributed?
As with the Primary Index, the AMP responsible for a row is determined by hashing.
How is the data of the Primary AMP Index stored on the AMPs?
The data is not sorted. This means that a binary search in the data blocks is not possible. All data blocks of the AMP must be searched.
When can the Primary AMP Index be used?
The Primary AMP Index can only be used for tables that are partitioned by columns.
What is the advantage of the Primary AMP Index over a Primary Index?
Rows do not have to be sorted on the AMPs. Bulk loading is, therefore, faster, for example.
What is the advantage of the Primary AMP Index compared to a NOPI table?
AMP-local joins are possible. Even if the hash value is not stored with every row, the optimizer knows the hashed columns and can use this information for AMP-local joins.
So you get the advantage of high performance joins column elimination, and no need to sort the rows during loading.
What should I use for Column Partitioned Tables? PI, NOPI, or Primary AMP Index?
I would start with Primary AMP Index Tables because they have advantages for loading and executing queries (Join Performance).
But as always, this is a design decision that can be different depending on the application.