Teradata Value Ordering for Join Index & NUSI
On the internet, there are several articles available, about Teradata Hashing and about how physical data rows are evenly distributed across all AMPs and sorted by ROWID.
There is almost no information about value ordering available. The only on-hand information I know is the official Teradata documentation giving a vague statement like “value order is a way of allowing for optimized range scans of values.” The majority of the technical descriptions terminate this topic with an example, showing how to use WHERE conditions on a range of dates. Obviously, this is not enough for a firm understanding of this subject.
I think it is time to shed some light on value ordering.
In principle, we have two possibilities to use value order:
Historically only the non-unique secondary index allowed for value order. Starting with the release of Teradata 12, value ordering for join indexes was introduced.
How Teradata Value Ordering is implemented
Those of you which were already reading our physical design posts will know how data retrieval works on a Teradata system:
- The Parsing Engine calculates a hash value (the ROWHASH) over the Primary Index column(s) of the rows.
- The so-called hashmap is consulted to identify the responsible AMP for each row
- The AMP checks its Master Index to determine the cylinder carrying the row(s)
- The AMP checks the Cylinder Index to determine the disk sector(s) taking the row(s) (a sequence of sectors makes up a data block)
Usually, in the case of Primary Index hashing, the Master Index and the Cylinder Index carry information about where each searched ROWHASH can be found (i.e., on which cylinder and in which sectors the rows are stored). Furthermore, row pointers within a data block are maintained, sorted by ROWID.
Value ordering means, that Master Index and Cylinder Index store information about where a 32-bit value (not the ROWHASH) is located. Furthermore, the data block contains a row pointer array sorted by this 32-bit value(s).
Value ordering is exclusively allowed for 32-bit integers (which includes the “date” data type, as it is stored internally as a 32-bit integer). The reason is that the physical structures (row pointer array, Master Index dictionary, Cylinder Index dictionary) which are used by the fundamental hashing technique have been reused without any design changes.
Let me make some important remarks here: Value ordering is always an All-AMP operation. Although the Join Index allows us to define any Primary Index, it is only used for data distribution in this case. The second option, the NUSI, anyway requires always a full table scan by design.
Value ordering of a Join Index gives the same advantages one would have from a single partition PPI table but with less space usage overhead.
Value order index access is always an All-AMP access with a subsequent search of a 32-bit value.
I hope it is clear now, why the strength of the value ordering method lies in the “order” of the 32-bit integer column in each of the physical structures (Master Index, Cylinder Index, Data Block).
More on indexing here: Teradata Indexing