As it is valid for all Teradata index structures, the goal of the Teradata Hash Index is to minimize disk IOs by offering an alternative access path to the rows.
We can achieve this goal by using three main effects:
- The storage of a vertical subset (columns) of the table in the hash index structure
- The selection of a better Primary Index to avoid costly redistribution activities to join preparation.
- The Ordering of rows by value, to support range scans
The Teradata Hash Index can be designed to cover all selected columns or partially cover the columns.
In the first case, base table access can be skipped altogether, but if the Hash Index does not cover the selected columns, the base table rows can be obtained in a subsequent step. Each index row carries the ROWID, which points to the related base table row (The ROWIDs from the Hash Index are extracted into a spool. In a subsequent step, the spool is used to access the base table rows).
The details above seem similar to the way Teradata Secondary Indexes are designed. But as Hash Indexes allow us to define their own Primary Index, rows can be distributed across the AMPs to fit your workload requirements. One can’t do this with Secondary Indexes.
Like a Non-Unique Secondary Index (NUSI), the rows of a hash index can be stored on the same AMP like the related base table row, simply by choosing the base table’s Primary Index the Primary Index of the hash index.
As valid for any index, Teradata Hash Indexes are maintained automatically by the system. There is overhead involved in UPDATE, INSERT and DELETE statements as the base table and the index structures have to be maintained.
Statistics are crucial and have to be collected at least on the Primary Index columns of the Hash Indexing to allow the Teradata Optimizer to estimate the Hash Index usage costs correctly.
Although the Teradata Hash Index can be viewed as a specialized form of a Single Table Join Index (STJI), some differences exist, which should be considered when choosing between the usage of a Single Table Join Index and the Hash index.
Maybe the most outstanding limitations compared with a Single Table Join Index are:
– A Hash Index cannot have a Partitioned Primary Index
– A Hash Index cannot have a Non-Unique Secondary Index.
– Hash Indexes cannot be specified for NOPI or column‑partitioned base tables as they are designed around the Teradata hashing algorithm.
– A hash index cannot be column partitioned
– A hash index must have a Primary Index; a Single Table Join Index can be created with or without a primary index if a table is column-partitioned (as column stores on Teradata never have a Primary Index)
Here is an example for creating a Hash Ordered Hash Index:
CREATE HASH INDEX HASH_IDX (COL1, SimilaryCOL2) ON MYTABLE
BY (COL1) ORDER BY HASH
BY (COL1) ORDER BY VALUE (COL2);
Nevertheless, keep in mind that fallback protection doubles the space needed to store the Hash Index structure.
The conclusion is that Hash Indexes are restricted form a Single Table Join Indexes, and I do not assume a big difference in performance. As an advantage, the syntax for creating a Hash Index is more straightforward than the one used for a Single Table Join index.
More on indexing here: Teradata Indexing