Teradata Data Types and Hashing

1
628

teradata data typesData Type conversions are costly operations (CPU). The wrong choice of data types causes some very nasty side effects.

One of the major drawbacks, if you didn’t care about consistent choice of data types, is related to the way Teradata is distributing rows across all AMPs.

The hashing algorithm is sensitive to the data type. Column values are used for the calculation of the hash value. Values which appear to be the same to you, may generate different hash values – with all its consequences.

Different hash values take away the possibility of AMP local operations, like the retrieving of rows with a primary index access, or the possibility to execute a resource gentle AMP-local  join.

You may assume the literals 3 and ‘3’ are the same from a hashing point of view, but a simple test will prove you wrong:

SELECT HASHROW(3) — 6D-27-DA-A6

SELECT HASHROW(‘3’) — 3F-DD-85-0D

As above example shows, two different hash values are generated. This means, 2 diffferent AMPs will hold the rows (disregarding the case where by luck both values are hashing to the same AMP).

On Teradata, classes of hash-compatible data types exists. They are interchangeably.

For example:

All Integer Data Types create the same hash value (INTEGER, BIGINT,BYTEINT,SMALLINT), interestingly any DECIMAL without precision (like DECIMAL(18,0) creates the same hash as the integer data types. I assume, decimals without precisions are internally stored as integers. Maybe somebody can comment on this.

Similar to integers, all character data types are hash compatible (CHAR, VARCHAR, etc.)

The main reason for running into problems with non-matching data types is a bad data model. Take already care, when designing the physical model, otherwise you may have to do costly changes late in the project.

Our Reader Score
[Total: 5    Average: 4.8/5]
Teradata Data Types and Hashing written by Roland Wenzlofsky on April 11, 2014 average rating 4.8/5 - 5 user ratings

1 COMMENT

  1. Hi Roland,
    Just to pick up on a question that you asked above. Yes, any DECIMAL data type with 0 digits after the decimal point will always generate the same hash as the same value in one of the integer data types.

    It is for the reason that you thought, in Teradata DECIMAL columns are stored as ‘scaled integers’. So, a value of 12.3 stored in a DECIMAL(10,2) is stored as 1230. Similarly, the same value stored in a DECIMAL(10,1) would be stored as 123. This is why decimals with 0 decimal places will hash the same as integers.

    Cheers,
    Dave

LEAVE A REPLY

Please enter your comment!
Please enter your name here