fbpx

Doing Teradata Hashing The Right Way

By Roland Wenzlofsky

March 12, 2014


This article assumes that you are already familiar with the general architecture of a Teradata System.

As you should already know, the AMPs are responsible for doing the work on a Teradata System.

Depending on the size of the system, you will have hundreds of AMPs, each one being in charge of handling the rows stored on its related virtual disk.

As I  already pointed out in an article about the Teradata High-Level Architecture, the primary goal is an even distribution of rows across all AMPs. Even distribution ensures parallelism.

How is an even data distribution on Teradata achieved?

It is dependent on the Primary Index.

Please don’t confuse the Primary Index with the term Primary Key! While Primary Key is a term used in data modeling, Primary Index is a scientific concept on Teradata.

Over the following paragraph, be aware that this is a simplified example, but enough for understanding the general concepts.

The Primary Index is a set of table columns which is used as input to a very efficient hashing algorithm. The output of applying the algorithm to the Primary Index value is the assignment of a  responsible AMP which will take over all work related to the observed row. The receiving AMP will write the row to its related virtual disk, and so it becomes the only one responsible for handling this row ever after.

As a side note, the order of the columns passed to the hashing algorithm does not matter. Still, data types will possibly matter. There are compatible data types and such which are considered different.

This data distribution strategy is straightforward and efficient. Each AMP holds a portion of the data; the Primary Index exclusively determines row distribution was chosen. Each AMP will maintain records of all tables available on the Teradata System.

Keep in mind that the slowest AMP determines the overall answer time of any DML statement. It will be the one with the most rows to handle. Linear scalability is ensured with an even distribution of rows across all AMPs. Linear scalability is the primary goal in designing the physical data model.

Like hashing algorithms in general, the Primary Index concept on Teradata allows direct access to the data related to a particular hash value. Primary Index access is the fastest way of retrieving rows from the disks.

Please be aware that any change of the Primary Index columns on a row will trigger a handover of this row to another responsible AMP. Avoid changing the PI as it is a costly process.

Primary Indexes can be defined UNIQUE or non-UNIQUE. Two records with the same content will be assigned the same hash value. To be able to distinguish them, more information will be added, i.e. a uniqueness value. Still, both records are handled by the same AMP.

Although there are more details involved in the data distribution process, the above description should be enough for you daily work with Teradata.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Avatar
    Roland Wenzlofsky says:

    HI. I assume the primary index you want to use is not distrubuting the rows evenly across all AMPs and you run out of space on a single AMP or a few AMPs

  • Avatar
    material.study says:

    Hi,

    Can I have Primary key columns different than the Primary Index? I tried doing that and my ETL process is failing to complain about the space issues. But, I have ample space and when I make Primary Key columns the same as Primary Index, it works just fine.

    I will appreciate your response or any input.

    Regards
    Nirav

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >