Doing Teradata Hashing The Right Way

This article presupposes your existing familiarity with the fundamental structure of a Teradata System.

As you are aware, the AMPs operate on a Teradata System.

The number of AMPs responsible for managing rows stored on their respective virtual disks varies based on the system’s size. There may be hundreds of AMPs.

As noted in my article about the Teradata High-Level Architecture, the primary goal is to distribute rows evenly among all AMPs in order to achieve parallelism.

Table of Contents

How Does Teradata Achieve Uniform Data Distribution?

What are the methods for achieving uniform data distribution on Teradata?

It depends on the Primary Index.

The Primary Index and Primary Key are distinct concepts. The Primary Key is used in data modeling, whereas the Primary Index is a physical design principle in Teradata.

Primary Index vs. Primary Key

Please note that the following paragraph is a simplified example but should provide an adequate understanding of the general concepts.

The Primary Index comprises table columns used as input for a highly efficient hashing algorithm. The algorithm’s output designates a responsible AMP that assumes all tasks pertaining to the corresponding row. The designated AMP stores the row on its associated virtual disk and becomes solely accountable for handling it thereafter.

The order of columns passed to the hashing algorithm is inconsequential. However, it’s crucial to consider data types, as compatibility is essential — dissimilar data types yield different results.

Data Distribution Strategy

The data distribution strategy employed is both straightforward and effective. A designated portion of the data is allocated to each AMP, with the distribution of selected rows solely determined by the Primary Index. All tables are accessible on the Teradata System and are duly recorded and updated by each AMP.

The AMP with the most rows to handle will determine the overall response time for any DML statement. Evenly distributing rows across all AMPs will ensure linear scalability, which should be the primary focus when designing the physical data model.

Teradata’s Primary Index concept facilitates direct access to data based on a hash value, similar to other hashing algorithms. Utilizing Primary Index access is the most expeditious method of retrieving rows from disks.

Changing the Primary Index

Changing the Primary Index columns on a row will result in a handover of the row to another responsible AMP, which is expensive. Therefore, it is recommended to avoid altering the PI.

Primary Indexes are classified as UNIQUE or non-UNIQUE. Records that share the same content are assigned identical hash values. Additional information, such as a uniqueness value, is added to differentiate these records. However, both records are managed by the same AMP.

While the data distribution process involves additional details, the preceding description should suffice for your everyday use of Teradata.

Related Services

🏗️ Planning a Data Platform Migration?

Architecture-first approach: we design before a single line of code is written. Zero data loss across every migration delivered.

Our Migration Services →

2 thoughts on “Doing Teradata Hashing The Right Way”

Roland Wenzlofsky

09/06/2016 at 2:28 pm

HI. I assume the primary index you want to use is not distrubuting the rows evenly across all AMPs and you run out of space on a single AMP or a few AMPs
material.study

08/29/2016 at 10:17 pm

Hi,

Can I have Primary key columns different than the Primary Index? I tried doing that and my ETL process is failing to complain about the space issues. But, I have ample space and when I make Primary Key columns the same as Primary Index, it works just fine.

I will appreciate your response or any input.

Regards
Nirav

Doing Teradata Hashing The Right Way

How Does Teradata Achieve Uniform Data Distribution?

Primary Index vs. Primary Key

Data Distribution Strategy

Changing the Primary Index

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

2 thoughts on “Doing Teradata Hashing The Right Way”

Leave a Comment Cancel reply