The Secondary Index in Teradata provides an alternate access path to the data records to reduce the disk IO’s while retrieving the data.
While B+ Tree Indexes are quite commonly used in database systems, Teradata uses a different approach.
Teradata distinguishes between unique secondary indexes (USI) and non-unique secondary indexes (NUSI). Although the naming may seduce to assume they are technically the same, they are not at all. This article is related to USI.
Unique secondary indexes are based on the same hashing algorithm Teradata applies to distribute data evenly across all disks. Technically, unique secondary indexes are tables like any other table. However, they are so-called sub-tables, which means they do not show up as a real object in any dictionary table (in contrast to join indexes).
The base table data is hashed by the primary index column(s). In contrast, the secondary index rows are hashed by another column or combination of columns, which presents the additional data path.
Each USI row contains the ROWID of the base table row, making it possible to lookup the base table row directly.
Teradata USI access is most of the time 2-AMP access: At first, the secondary index row is looked up. This row contains the ROWID of the base table, which is used to lookup the base table row. Sometimes, if lucky, the USI row and the base table row are located at the same AMP, making the retrieval of the base table row AMP-local:
The same as a unique primary index, the unique secondary index distributes evenly across the AMPs, and there is no need to take care of data skewing.
As always, Teradata needs recent statistics to consider index usage. USI has to be highly selective. Otherwise, Teradata probably decides not to use it when building the execution plan. It is stated that if more than 10% of the table rows would be selected from the USI, it will not be used. 10% is only a rule of thumb.
The usual approach is creating the USI you would like to be used and check if it is indeed used (EXPLAIN statement). If used, keep it, otherwise drop it (remember, unique secondary indexes are physical tables occupying disk space).
Information about the Teradata USI you will find nowhere else:
The detailed data retrieval process
- The Parsing Engine is consulting the hash map and locates the AMP carrying the USI row.
- The AMP containing the USI row finds the USI data block via the Master Index & the Cylinder Index.
- The USI data block is moved into the FSG cache.
- The AMP does an in-memory binary search of the data block and locates the USI row.
- The designated row carries the ROWID of the base table. The Parsing Engine uses this ROWID to look up the base table AMP and instructs it to retrieve the base table row.
- The base table AMP locates the data block via Master Index & Cylinder Index lookup.
- The base table AMP moves the data block into the FSG cache.
- With an in-memory binary search on the base table block, the base table row is located.
The Secondary Index in Teradata – The NUSI compendium
The Teradata Join Index Guide – We leave no questions unanswered!
The Teradata Partitioned Primary Index (PPI) Guide
Watch the Teradata Indexing Video Course