Teradata Data Types and Hashing
Data Type conversions are costly operations (CPU). The wrong choice of data types causes some very nasty side effects.
One of the major drawbacks, if you didn’t care about consistent choice of data types, is related to the way Teradata is distributing rows across all AMPs.
The hashing algorithm is sensitive to the data type. Column values are used for the calculation of the hash value. Values which appear to be the same to you, may generate different hash values – with all its consequences.
You may assume the literals 3 and ‘3’ are the same from a hashing point of view, but a simple test will prove you wrong:
SELECT HASHROW(3) — 6D-27-DA-A6
SELECT HASHROW(‘3’) — 3F-DD-85-0D
As above example shows, two different hash values are generated. This means, 2 diffferent AMPs will hold the rows (disregarding the case where by luck both values are hashing to the same AMP).
On Teradata, classes of hash-compatible data types exists. They are interchangeably.
All Integer Data Types create the same hash value (INTEGER, BIGINT,BYTEINT,SMALLINT), interestingly any DECIMAL without precision (like DECIMAL(18,0) creates the same hash as the integer data types. I assume, decimals without precisions are internally stored as integers. Maybe somebody can comment on this.
Similar to integers, all character data types are hash compatible (CHAR, VARCHAR, etc.)
The main reason for running into problems with non-matching data types is a bad data model. Take already care, when designing the physical model, otherwise you may have to do costly changes late in the project.