When Teradata Indexing fails
Selecting rows from extensive tables with billions of entries can be resource-intensive. While it may not cause significant strain on resources when accessing rows directly through primary indices or unique secondary indices (USI), using a “LIKE” operator to select rows can be extremely resource-intensive.
SELECT * FROM wide_table WHERE column_B LIKE ‘%A%’;
Typically, the aforementioned inquiry is resolved through a full table scan (FTS), without any additional measures.
Using a NUSI or single table join index (STJI) can minimize disk IO and decrease query runtime.
Using an STJI may be viable; however, persuading the optimizer to replace the FTS on the base table with an FTS on a NUSI sub-table could prove challenging. The inadequate statistics (whether collected or sampled) may not adequately support index usage. Typically, when the search parameter begins with “value%”, the optimizer may opt for a NUSI traversal instead of “%value%”. This preference stems from the storage manner of statistic histograms in Teradata.
Note that the use of LIKE operators prevents a binary search in NUSI data blocks due to sorting the row pointer array by ROWHASH of indexed columns, which LIKE operators cannot hash. This limits access to a full table scan of the NUSI sub-table, which is only feasible when the number of NUSI data blocks to be read is significantly lower than the number of base table data blocks accessed with an FTS.
The Solution without Teradata Indexing
Fortunately, there is a great technique to prevent a complete table scan on a broad table.
We generate a condensed version of the big table, comprising solely the primary key columns and those essential for fulfilling the LIKE operator’s requirements. To employ this technique, it is crucial that the primary key and primary index of the table match.
We will join our auxiliary table with the wide table instead of selecting directly from the latter.
t01.primary_key = t02.primary_key
t02.column_A LIKE ‘%A%’;
If all necessary statistics are available for the query, the optimizer will generate an execution plan that is less demanding on resources than the original plan.
Performing a complete scan of the minor auxiliary table will isolate only the rows that fulfill the LIKE operator and transfer them to the spool. Since the supporting table is significantly smaller, there will be a significant decrease in IO.
The second step involves a swift and AMP-local primary index merge join of the spooled records with the wide table.
The above solution was effectively executed in multiple telecommunication projects, which involved carrying out daily queries using LIKE operators on massive call detail record (CDR) tables. Typically, the generation of supplementary tables is deferred until the nightly batch load window.