Understanding Teradata's Primary AMP Index For Improved Performance

Experienced Teradata users are familiar with the concept of a primary index. However, a new term has surfaced with the introduction of Teradata Release 15: Primary AMP Index.

This blog post will demonstrate a Primary AMP Index’s benefits and optimal usage for improving load and query performance.

First, let us examine Teradata’s approaches over the years to distributing data evenly among all AMPs.

Table of Contents

The Teradata Primary Index

Initially, only the Primary Index existed. It defines the table’s columns that generate a hash value, which determines which AMP manages each row.

The hash value is used to sort the rows of primary index tables.

Using the Primary Index and sorting by hash value enables the Optimizer to restrict the row search to the corresponding AMPs.

By sorting the rows using the hash value, the AMP can efficiently locate them through a binary search within the data blocks.

Teradata can join two AMP-local tables without redistributing rows if they have the same primary index.

The Teradata NOPI Table

Column partitioning requires the use of NOPI tables, which allow creating tables without a primary index. This feature was added a few years ago.

NOPI tables are distributed evenly and randomly across all AMPs.

To locate rows, one must search through all AMPs, as no specific hash values are assigned to them.

Furthermore, since no hash value is available to locate rows within data blocks, a binary search is not feasible. Teradata is therefore forced to read all data blocks.

NOPI tables are commonly used in staging, since bulk load utilities such as Fastload distribute rows uniformly across all AMPs using a round-robin method, eliminating the need to sort them by hash value.

Before Teradata Release 15, column partitioning was exclusively available through this method.

The sole benefit of column partitioning in this case is limiting I/Os to only the necessary columns, since a full table scan is always required.

Joining a NOPI table requires row redistribution.

The Teradata Primary AMP Index

The Primary AMP Index was developed to address the limitations identified with NOPI tables.

Skewed tables are uncommon for NOPI tables, although they can occur when transferring data from a skewed table via an INSERT…SELECT command, which inserts rows into the local AMP of the NOPI table.

To address this issue, a new syntax was introduced to enable row redistribution based on a hash value.

Since Teradata 15, columnar tables can have a primary index, making NOPI tables less significant.

The Primary AMP Index was created to combine the benefits of the Primary Index — single-AMP access, binary search, and AMP-local joins — with the advantages of the NOPI table, which eliminates the need to sort rows by hash value during loading.

How the Primary AMP Index Works

How are the Rows of the Primary AMP Index distributed?

As with the Primary Index, the AMP responsible for a row is determined by hashing.

How is the data of the Primary AMP Index stored on the AMPs?

The data is not sorted. This means that a binary search in the data blocks is not possible. The AMP must search all data blocks.

When can the Primary AMP Index be used?

We can only use the Primary AMP Index for tables partitioned by columns.

What is the advantage of the Primary AMP Index over a Primary Index?

Rows do not have to be sorted on the AMPs. Bulk loading is, therefore, faster, for example.

What is the advantage of the Primary AMP Index compared to a NOPI table?

AMP-local joins are possible. Even if the hash value is not stored with every row, the Optimizer knows the hashed columns and can use this information for AMP-local joins.

So you get the advantage of high-performance joins, column elimination, and no need to sort the rows during loading.

What should I use for Column Partitioned Tables? PI, NOPI, or Primary AMP Index?

I would start with Primary AMP Index Tables because they have advantages for loading and executing queries (Join performance).
But as always, this design decision can vary depending on the application.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

2 thoughts on “Understanding Teradata’s Primary AMP Index for Improved Performance”

Ran

09/30/2021 at 3:16 pm

Could you elaborate on the disadvantages of Primary Amp?
when it should NOT be used, except for some tactical queries maybe (because all the data blocks of the AMP must be searched)?
Luis J Cárdenas

02/09/2021 at 6:24 am

Very useful! thank you!

Understanding Teradata’s Primary AMP Index for Improved Performance