The Teradata Hash Index Guide
As it is valid for all Teradata index structures, the goal of the Teradata Hash Index is to minimize disk IOs by offering an alternative access path to the rows.
We can achieve this goal, by using three main effects:
- The storage of a vertical subset (columns) of the table in the hash index structure
- The selection of a better Primary Index to avoid costly redistribution activities for join preparation.
- The Ordering of rows by value, to support range scans
The Teradata Hash Index can be designed to cover all selected columns, or it may only partially cover the columns.
In the first case, base table access can be skipped altogether, but if the Hash Index is not covering the selected columns, the base table rows can be obtained in a subsequent step. Each index row carries the ROWID, which is pointing to the related base table row (The ROWIDs from the Hash Index are extracted into a spool, in a subsequent step the spool is used to access the base table rows).
The details above seem similar to the way Teradata Secondary Indexes are designed. But as Hash Indexes allow to define their own Primary Index, rows can be distributed across the AMPs in a way which fits your workload requirements. One can’t do this with Secondary Indexes.
Similar to a Non-Unique Secondary Index (NUSI), the rows of a hash index can be stored on the same AMP like the related base table row, simply by choosing the Primary Index of the base table as the Primary Index of the hash index.
As valid for any index, Teradata Hash Indexes are maintained automatically by the system, and there is overhead involved in the case of UPDATE, INSERT and DELETE statements as the base table, and the index structures have to be maintained.
Statistics are crucial and have to be collected at least on the Primary Index columns of the Hash Indexing to allow the Teradata Optimizer to estimate the costs of the Hash Index usage correctly.
Although the Teradata Hash Index can be viewed as a specialized form of a Single Table Join Index (STJI) – there are a lot of features in common – some differences exist which should be considered when having the choice between the usage of a Single Table Join Index and the Hash index.
Maybe the most outstanding limitations compared with a Single Table Join Index are:
– A Hash Index cannot have a Partitioned Primary Index
– A Hash Index cannot have a Non-Unique Secondary Index.
– Hash Indexes cannot be specified for NOPI or column‑partitioned base tables as they are designed around the Teradata hashing algorithm.
– A hash index cannot be column partitioned
– A hash index must have a Primary Index; a Single Table Join Index can be created with or without a primary index if a table is column-partitioned (as column stores on Teradata never have a Primary Index)
Here is an example for creating a Hash Ordered Hash Index:
CREATE HASH INDEX HASH_IDX (COL1, SimilaryCOL2) ON MYTABLE
BY (COL1) ORDER BY HASH
BY (COL1) ORDER BY VALUE (COL2);In both cases, the “by” clause defines the columns used for data distribution (like the Primary Index), but keep in mind that the columns used for data distribution have to be part of the columns which make up the Hash Index column list.If we want to increase fault tolerance, FALLBACK protection can be used in the same way like it is used for base tables.
A second copy of each Hash Index row will be stored on a fallback AMP. In case the main AMP is down, and you don’t have fallback protection, you are not able to use the Hash Index anymore and updates on the base table are not allowed until the AMP is coming online again.
Nevertheless, keep in mind, that fallback protection doubles the space needed to store the Hash Index structure.
The conclusion is that Hash Indexes are a restricted form a Single Table Join Indexes and I do not assume a big difference in performance. As an advantage, the syntax for creating a Hash Index is simpler than the one used for a Single Table Join index.
More on indexing here: Teradata Indexing