Column Partitioning With The Primary AMP Index

Roland Wenzlofsky

January 26, 2020

minutes reading time

Anyone who has been dealing with Teradata for a long time knows what a primary index is. But at some point, a new term crept in with Teradata Release 15: Primary AMP Index.

In this blog post, I will show you what it is and when a Primary AMP is best used to improve the performance of your loads and queries.

Let’s first look at Teradata’s strategies over the years to distribute data equally across all AMPs.

The Teradata Primary Index

At the very beginning, only the Primary Index was available. The Primary Index defines the columns of a table from which a hash value is calculated. This hash value finally determines which AMP is responsible for which rows.

This hash value sorts the rows of the tables for primary index tables.

The Primary Index, together with the sorting by the hash value, allows the Optimizer to limit the search for rows to the AMPs which contain the rows.

Since the hash value sorts the rows, the AMP can easily find the rows using a binary search in the data blocks.

Teradata can perform joins of two tables with the same primary index AMP-local without redistributing rows.

The Teradata NOPI Table

A few years ago, the possibility of creating tables without a primary index was added. NOPI tables were mandatory if we used column partitioning.

The rows of NOPI tables are randomly but evenly distributed across all AMPs.

Since there is no hash value assigned to specific AMPs, finding rows is only possible by searching for them among all AMPs.

But this is not the end of the story. Also, the finding of the rows within the data blocks cannot be done by binary search without a hash value. This means that Teradata must read all data blocks.

NOPI Tables are mainly used in staging because the bulk load utilities (fastload etc.) do not have to sort the rows by the hash value but only distribute them evenly over all AMPs in a round-robin procedure.

In addition, before Teradata Release 15, it was the only way to use column partitioning.

The only advantage of column partitioning was the restriction of the IOs to the required columns because a full table scan was always necessary.

Joining a NOPI table always requires the redistribution of the rows.

The Teradata Primary AMP Index

The Primary AMP Index is a result of problems recognized by the use of NOPI tables.

NOPI tables are typically not skewed, but it can happen when an INSERT…SELECT of a skewed table into a NOPI table is done. In this case, the rows are inserted into the NOPI table AMP-local.

To solve this problem, the possibility of redistributing the rows according to a hash value has been created (using a new syntax).

In the meantime (since Teradata 15), columnar tables can also have a primary index, and the significance of NOPI tables has decreased.

The Primary AMP Index was introduced to combine some advantages of the Primary Index (single AMP access, binary search, AMP-local join) with the benefits of the NOPI table (no sorting of the rows by hash value required when loading).

How the Primary AMP Index works

How are the Rows of the Primary AMP Index distributed?

As with the Primary Index, the AMP responsible for a row is determined by hashing.

How is the data of the Primary AMP Index stored on the AMPs?

The data is not sorted. This means that a binary search in the data blocks is not possible. The AMP must search all data blocks.

When can the Primary AMP Index be used?

We can only use the Primary AMP Index for tables partitioned by columns.

What is the advantage of the Primary AMP Index over a Primary Index?

Rows do not have to be sorted on the AMPs. Bulk loading is, therefore, faster, for example.

What is the advantage of the Primary AMP Index compared to a NOPI table?

AMP-local joins are possible. Even if the hash value is not stored with every row, the Optimizer knows the hashed columns and can use this information for AMP-local joins.

So you get the advantage of high-performance joins column elimination and no need to sort the rows during loading.

What should I use for Column Partitioned Tables? PI, NOPI, or Primary AMP Index?

I would start with Primary AMP Index Tables because they have advantages for loading and executing queries (Join performance).
But as always, this design decision can vary depending on the application.

  • Could you elaborate on the disadvantages of Primary Amp?
    when it should NOT be used, except for some tactical queries maybe (because all the data blocks of the AMP must be searched)?

  • Avatar
    Luis J Cárdenas says:

    Very useful! thank you!

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like