As it is valid for all Teradata index structures, the goal of the Teradata Hash Index is to minimize disk IOs by offering an alternative access path to the rows.
We can achieve this goal by using three main effects:
- The storage of a vertical subset (columns) of the table in the hash index structure
- Select a better Primary Index to avoid costly redistribution activities to join preparation.
- The Ordering of rows by value to support range scans
The Teradata Hash Index can be designed to cover all selected columns or partially cover the columns.
In the first case, base table access can be skipped altogether, but if the Hash Index does not cover the selected columns, the base table rows can be obtained in a subsequent step. Each index row carries the ROWID, which points to the corresponding base table row (The ROWIDs from the Hash Index are extracted into a spool. In a subsequent step, the spool is used to access the base table rows).
The details above seem similar to the way Teradata Secondary Indexes are designed.But as Hash Indexes allow us to define their own Primary Index, rows can be distributed across the AMPs to fit your workload requirements. One can’t do this with Secondary Indexes.
Like a Non-Unique Secondary Index (NUSI), the rows of a hash index can be stored on the same AMP as the related base table row simply by choosing the base table’s Primary Index, the Primary Index of the hash index.
As valid for any index, the system maintains Teradata Hash Indexes automatically. There is overhead involved in UPDATE, INSERT and DELETE statements as the base table, and the index structures must be maintained.
Statistics are crucial and must be collected at least on the Primary Index columns of the Hash Indexing to allow the Teradata Optimizer to estimate the Hash Index usage costs correctly.
Although the Teradata Hash Index can be viewed as a specialized form of a Single Table Join Index (STJI), some differences exist, which should be considered when choosing between the usage of a Single Table Join Index and the Hash index.
Maybe the most outstanding limitations compared with a Single Table Join Index are:
- A Hash Index cannot have a Partitioned Primary Index
- A Hash Index cannot have a Non-Unique Secondary Index.
- Hash Indexes cannot be specified for NOPI or column‑partitioned base tables as they are designed around the Teradata hashing algorithm.
- A hash index cannot be column partitioned
- A hash index must have a Primary Index; a Single Table Join Index can be created with or without a primary index if a table is column-partitioned (as column stores on Teradata never have a Primary Index)
Here is an example for creating a Hash Ordered Hash Index:
CREATE HASH INDEX HASH_IDX (COL1, SimilaryCOL2) ON MYTABLE
BY (COL1) ORDER BY HASH
CREATE HASH INDEX HASH_IDX (COL1, COL2) ON MYTABLE
BY (COL1) ORDER BY VALUE (COL2);
In both cases, the “by” clause defines the columns used for data distribution (like the Primary Index), but keep in mind that the columns used for data distribution have to be part of the columns which make up the Hash Index column list. If we want to increase fault tolerance, FALLBACK protection can be used in the same way as it is used for base tables.
The second copy of each Hash Index row will be stored on a fallback AMP. In case the primary AMP is down, and you don’t have fallback protection, you cannot use the Hash Index anymore, and updates on the base table are not allowed until the AMP is coming online again.
Nevertheless, keep in mind that fallback protection doubles the space needed to store the Hash Index structure.
The conclusion is that Hash Indexes are restricted from Single Table Join Indexes, and I do not assume a big difference in performance. As an advantage, the syntax for creating a Hash Index is more straightforward than the one used for a Single Table Join index.
More on indexing here: Teradata Indexing