When Teradata Indexing fails
We all have been in the situation where we had to select rows from extensive tables containing billions of rows. While this is mild on resource consumption with direct access to the rows (Primary Index, USI), it becomes a resource killer if the rows have to be selected by using a LIKE operator:
SELECT * FROM wide_table WHERE column_B LIKE ‘%A%’;
Most of the time, without further measures, the above query will be resolved with a full table scan (FTS).
To reduce disk IO and query runtime, the first idea might be to use a NUSI or single table join index (STJI).
While an STJI could be a good choice, it will be tough to convince the optimizer to replace the FTS on the base table with an FTS on a NUSI sub-table, as statistics (sampled or collected) may not be sufficient to consider index usage (in general, “LIKE ‘value%'” will more often allow the optimizer to do a NUSI traversal than “LIKE ‘%value%'”, this is related to the way statistic histograms are stored on Teradata).
Furthermore, be aware that LIKE operators disallow a binary search within the NUSI data blocks as the row pointer array is sorted by ROWHASH of the indexed columns (or a 4-byte integer value), but the LIKE operator is not hashable. The only kind of access would be a full table scan of the NUSI sub-table, which makes sense if the number of NUSI data blocks to be read is significantly lower than the number of base table data blocks accessed with an FTS.
The Solution without Teradata Indexing
Luckily, there exists a fantastic trick to avoid the full table scan on the wide table:
We create a copy of the comprehensive table, containing only a subset of columns, namely the primary key columns and only those needed to satisfy the LIKE operator. The table’s primary Key and Primary Index has to match to use this trick!
Instead of directly selecting from the wide table, we will join the wide table with our auxiliary table:
t01.primary_key = t02.primary_key
t02.column_A LIKE ‘%A%’;
In case all needed statistics are available to satisfy the above query, the optimizer will create an execution plan much more benign on resource usage than the original one:
A full table scan on the small auxiliary table will be done. Only the rows satisfying the LIKE operator are moved into the spool. The supplementary table is much smaller, leading to a considerable reduction in IO.
In a second step, a primary index merge join (AMP-local and fast) between the spooled records and the wide table is done.
The solution described above was successfully implemented in several telecommunication projects, where queries with LIKE operators on huge call detail record (CDR) tables are done daily. The creation of supplementary tables is usually moved into the nightly batch load window.