1

Teradata Data Types and Hashing

Data Type conversions are costly operations (CPU). The wrong choice of data types causes some very nasty side effects.

One of the major drawbacks, if you didn’t care about consistent choice of data types, is related to the way Teradata is distributing rows across all AMPs.

The hashing algorithm is sensitive to the data type. Column values are used for the calculation of the hash value. Values which appear to be the same to you, may generate different hash values – with all its consequences.

Different hash values take away the possibility of AMP local operations, like the retrieving of rows with a primary index access, or the possibility to execute a resource gentle AMP-local  join.

You may assume the literals 3 and ‘3’ are the same from a hashing point of view, but a simple test will prove you wrong:

SELECT HASHROW(3) — 6D-27-DA-A6

SELECT HASHROW(‘3’) — 3F-DD-85-0D

As above example shows, two different hash values are generated. This means, 2 diffferent AMPs will hold the rows (disregarding the case where by luck both values are hashing to the same AMP).

On Teradata, classes of hash-compatible data types exists. They are interchangeably.

For example:

All Integer Data Types create the same hash value (INTEGER, BIGINT,BYTEINT,SMALLINT), interestingly any DECIMAL without precision (like DECIMAL(18,0) creates the same hash as the integer data types. I assume, decimals without precisions are internally stored as integers. Maybe somebody can comment on this.

Similar to integers, all character data types are hash compatible (CHAR, VARCHAR, etc.)

The main reason for running into problems with non-matching data types is a bad data model. Take already care, when designing the physical model, otherwise you may have to do costly changes late in the project.

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Dave Wellman Dave Wellman says:

    Hi Roland,
    Just to pick up on a question that you asked above. Yes, any DECIMAL data type with 0 digits after the decimal point will always generate the same hash as the same value in one of the integer data types.

    It is for the reason that you thought, in Teradata DECIMAL columns are stored as ‘scaled integers’. So, a value of 12.3 stored in a DECIMAL(10,2) is stored as 1230. Similarly, the same value stored in a DECIMAL(10,1) would be stored as 123. This is why decimals with 0 decimal places will hash the same as integers.

    Cheers,
    Dave

  • >