Doing Teradata Hashing The Right Way

This article presupposes your existing familiarity with the fundamental structure of a Teradata System.

As you are aware, the AMPs operate on a Teradata System.

The number of AMPs responsible for managing rows stored on their respective virtual disks varies based on the system’s size. There may be hundreds of AMPs.

As noted in my article about the Teradata High-Level Architecture, the primary goal is to distribute rows evenly among all AMPs in order to achieve parallelism.

How Does Teradata Achieve Uniform Data Distribution?

What are the methods for achieving uniform data distribution on Teradata?

It depends on the Primary Index.

The Primary Index and Primary Key are distinct concepts. The Primary Key is used in data modeling, whereas the Primary Index is a physical design principle in Teradata.

Primary Index vs. Primary Key

Please note that the following paragraph is a simplified example but should provide an adequate understanding of the general concepts.

The Primary Index comprises table columns used as input for a highly efficient hashing algorithm. The algorithm’s output designates a responsible AMP that assumes all tasks pertaining to the corresponding row. The designated AMP stores the row on its associated virtual disk and becomes solely accountable for handling it thereafter.

The order of columns passed to the hashing algorithm is inconsequential. However, it’s crucial to consider data types, as compatibility is essential — dissimilar data types yield different results.

Data Distribution Strategy

The data distribution strategy employed is both straightforward and effective. A designated portion of the data is allocated to each AMP, with the distribution of selected rows solely determined by the Primary Index. All tables are accessible on the Teradata System and are duly recorded and updated by each AMP.

The AMP with the most rows to handle will determine the overall response time for any DML statement. Evenly distributing rows across all AMPs will ensure linear scalability, which should be the primary focus when designing the physical data model.

Teradata’s Primary Index concept facilitates direct access to data based on a hash value, similar to other hashing algorithms. Utilizing Primary Index access is the most expeditious method of retrieving rows from disks.

Changing the Primary Index

Changing the Primary Index columns on a row will result in a handover of the row to another responsible AMP, which is expensive. Therefore, it is recommended to avoid altering the PI.

Primary Indexes are classified as UNIQUE or non-UNIQUE. Records that share the same content are assigned identical hash values. Additional information, such as a uniqueness value, is added to differentiate these records. However, both records are managed by the same AMP.

While the data distribution process involves additional details, the preceding description should suffice for your everyday use of Teradata.

Related Services

🏗️ Planning a Data Platform Migration?

Architecture-first approach: we design before a single line of code is written. Zero data loss across every migration delivered.

Our Migration Services →

📊 Data Platform Migration Survey

Help us map where the industry is heading. Results are public — see what others chose.

1. What is your current data platform?

2. Where are you migrating to (or evaluating)?

Migrating FROM
Migrating TO

Thanks for voting! Share this with your network.

Follow me on LinkedIn for daily insights on data warehousing and platform migrations.

Stay Ahead in Data Warehousing

Get expert insights on Teradata, Snowflake, BigQuery, Databricks, Microsoft Fabric, and modern data architecture — delivered to your inbox.

2 thoughts on “Doing Teradata Hashing The Right Way”

  1. HI. I assume the primary index you want to use is not distrubuting the rows evenly across all AMPs and you run out of space on a single AMP or a few AMPs

    Reply
  2. Hi,

    Can I have Primary key columns different than the Primary Index? I tried doing that and my ETL process is failing to complain about the space issues. But, I have ample space and when I make Primary Key columns the same as Primary Index, it works just fine.

    I will appreciate your response or any input.

    Regards
    Nirav

    Reply

Leave a Comment

DWHPro

Expert network for enterprise data platforms. Senior consultants, project teams built for your challenge — across Teradata, Snowflake, Databricks, and more.

📍Vienna, Austria & Jacksonville, Florida

Quick Links
Services Team Teradata Book Blog Contact Us
Connect
LinkedIn → [email protected]
Newsletter

Join 4,000+ data professionals.
Weekly insights on Teradata, Snowflake & data architecture.