How the Number of Rows per Data Block Affects Teradata NUSI Selectivity: A Case Study

Roland Wenzlofsky

May 5, 2023

minutes reading time

The primary goal is to show how the number of rows per base table data block impacts selectivity.

The example highlights the correlation between Teradata NUSI selectivity, the average size of data blocks, the size of rows, and the number of rows per data block.

First, we prepared an example table called NUSI_SELECTIVITY. It contained about 270.000 rows. PK is a unique number we used for even data distribution (as you can see, it is the Primary Index column), and COL_A was being used for row qualification:

CREATE MULTISET TABLE INDEXING.NUSI_SELECTIVITY
(
PK INTEGER,
COL_A CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PK );

SELECT COUNT(*) FROM INDEXING.NUSI_SELECTIVITY
269.378

In this table, we implemented a Non-Unique Secondary Index (NUSI) on column COL_A.

CREATE INDEX (COL_A) ON INDEXING.NUSI_SELECTIVITY;

The table we used as an example contained fixed-length data types, resulting in a consistent row length. Specifically, each row in the table took up 14 bytes, with the primary key integer column utilizing 4 and COL_A utilizing an additional 10.

We populated the example table using the following query (SOME_TABLE was solely utilized for generating random data):

INSERT INTO INDEXING.NUSI_SELECTIVITY
SELECT 
  PK,
  CASE WHEN PK <= 19.332 THEN 'A' ELSE 'B' END
FROM 
  INDEXING.SOME_TABLE;

We collected statistics on COL_A as we wanted to reproduce reliable results, not relying on random AMP sampling:

COLLECT STATS ON INDEXING.NUSI_SELECTIVITY COLUMN(COL_A);

The query used for row qualification is as follows, wherein ‘A’ is the value for testing selectivity:

SELECT * FROM INDEXING.NUSI_SELECTIVITY  WHERE COL_A = 'A'

Throughout our preparations (spending a lot of time), we determined that 19.332 is the maximum number of qualifying rows for this example table, which allows for NUSI usage. We operated at the threshold where the optimizer switched from NUSI usage to a full table scan with only one more record selected in our query.

With 19.332 qualifying rows, we got the below EXPLAIN output, showing that the NUSI was used (“by way of index #4):

EXPLAIN SELECT * FROM INDEXING.NUSI_SELECTIVITY WHERE COL_A = 'A'

1) First, we lock a distinct INDEXING."pseudo table" for read on a
 RowHash to prevent global deadlock for INDEXING.NUSI_SELECTIVITY.
 2) Next, we lock INDEXING.NUSI_SELECTIVITY for read.
 3) We do an all-AMPs RETRIEVE step from INDEXING.NUSI_SELECTIVITY by
 way of index # 4 "INDEXING.NUSI_SELECTIVITY.COL_A = 'A '" with no
 residual conditions into Spool 1 (group_amps), which is built
 locally on the AMPs. The size of Spool 1 is estimated with high
 confidence to be 19,332 rows (676,620 bytes). The estimated time
 for this step is 0.58 seconds.
 4) Finally, we send out an END TRANSACTION step to all AMPs involved
 in processing the request.
 -> The contents of Spool 1 are sent back to the user as the result of
 statement 1. The total estimated time is 0.58 seconds.

Choosing just one additional row (totaling 19.333 rows), the optimizer deemed the utilization of NUSI to be overly expensive and instead opted for a complete table scan.

DELETE FROM INDEXING.NUSI_SELECTIVITY;
INSERT INTO INDEXING.NUSI_SELECTIVITY
SELECT 
  PK,
  CASE WHEN PK <= 19.333 THEN 'A' ELSE 'B' END
FROM 
  INDEXING.SOME_TABLE;
COLLECT STATS ON INDEXING.NUSI_SELECTIVITY COLUMN(COL_A);

With 19.333 qualifying rows, the optimizer did an FTS on the base table (“by way of an all-rows scan”):

EXPLAIN SELECT * FROM INDEXING.NUSI_SELECTIVITY WHERE COL_A = 'A';

1) First, we lock a distinct INDEXING."pseudo table" for read on a
 RowHash to prevent global deadlock for
 INDEXING.NUSI_SELECTIVITY.
 2) Next, we lock INDEXING.NUSI_SELECTIVITY for read.
 3) We do an all-AMPs RETRIEVE step from
 INDEXING.NUSI_SELECTIVITY by way of an all-rows scan with a
 condition of ("INDEXING.NUSI_SELECTIVITY.COL_A = 'A '") into
 Spool 1 (group_amps), which is built locally on the AMPs. The
 input table will not be cached in memory but is eligible for
 synchronized scanning. The size of Spool 1 is estimated with high
 confidence to be 24,137 rows (1,086,165 bytes). The estimated
 time for this step is 0.82 seconds.
 4) Finally, we send out an END TRANSACTION step to all AMPs involved
 in processing the request.
 -> The contents of Spool 1 are sent back to the user as the result of
 statement 1. The total estimated time is 0.82 seconds.

We identified the count of rows where the optimizer transitioned from utilizing NUSI to executing a full table scan. We aimed to demonstrate the impact of average data block size, row size, and rows per data block on selectivity.

We anticipated that fewer rows per data block would suffice for the NUSI. This is because the rows are spread out among numerous base table data blocks. As a result, the FTS needs to transfer more data blocks to the FSG cache.

Let’s calculate the selectivity percentage for our example. The result is approximately 7.1%, obtained by dividing 19.332 by 269.378.

If we select less than 7.1% of the rows of our example table NUSI_SELECTIVITY, the optimizer will use the NUSI on column COL_A.

Now let’s proceed to the thrilling portion of our case. We shall demonstrate that selectivity is contingent on the number of rows that can fit into a single data block. Since Teradata can only transfer data blocks from disk to FSG cache, the optimizer’s IO considerations rely on data blocks rather than row tallies.

The reciprocal of the number of rows per data block determines the boundary between FTS and NUSI usage. With even distribution of rows, each data block encompasses one qualifying row on average. As NUSI requires accessing all data blocks, it is preferable to perform FTS.

We duplicated our sample table and named it SELECTIVITY_COPY. Then, we upgraded the data type of COL_A from CHAR(10) to CHAR(20). Consequently, the number of rows we can store in each base table data block is reduced.

CREATE MULTISET TABLE INDEXING.NUSI_SELECTIVITY_COPY, NO FALLBACK ,
 NO BEFORE JOURNAL,
 NO AFTER JOURNAL,
 CHECKSUM = DEFAULT,
 DEFAULT MERGEBLOCKRATIO
 (
 PK INTEGER,
 COL_A CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC)
UNIQUE PRIMARY INDEX ( PK );

We anticipated a change in NUSI utilization since the optimizer must employ the novel table with less selectivity.

Similar to the initial segment of our demonstration, we computed the number of rows between FTS and NUSI applications. Given COL_A’s data type as a character(20), it was revealed that we could retrieve 24.136 rows before the optimizer shifted to an FTS. The selectivity subsequently rose to 9%.

Selectivity changed with the decrease in rows per data block, as predicted.

I replicated comparable configurations to the previous instance, with NUSI selectivity ranging from 0.1% to 40%. The widely circulated internet guideline stating that the NUSI selectivity is limited to approximately 10% is incorrect.

Switching character sets from Latin to Unicode would change selectivity, as Unicode characters require two bytes compared to the one byte required for Latin characters.

The important lesson is that changing the table’s data type alone can shift your index usage landscape!

As supplementary instruction, I would like to present another instance where the result was unexpected:

We utilized our second table, NUSI_SELECTIVITY_COPY, but any other table would have sufficed. Recall that the table had an excess record unsuitable for NUSI use.

We ran our query again, selecting only col_a, effectively making the NUSI our coverage.

EXPLAIN SELECT COL_A FROM INDEXING.NUSI_SELECTIVITY WHERE COL_A = 'A';

We expected that changing to a covering query would reduce costs since there would be no need to look up the base table using ROWID. We should have lowered the optimizer selectivity requirements once more. Since our sample table was on the boundary between using NUSI and FTS, we believed that shifting it by just one record would trigger the use of NUSI.

The execution plan revealed that coverage did not affect NUSI usage, despite our expectations. It was surprising to see that even according to the Teradata Documentation, coverage was expected to increase the likelihood of NUSI usage. Perhaps this only applies to other Teradata releases, as my tests were conducted on TD 14.10. Additionally, we still had an FTS.

EXPLAIN SELECT COL_A FROM INDEXING.NUSI_SELECTIVITY_COPY WHERE COL_A = 'A'

1) First, we lock a distinct INDEXING."pseudo table" for read on a
 RowHash to prevent global deadlock for
 INDEXING.NUSI_SELECTIVITY_COPY.
 2) Next, we lock INDEXING.NUSI_SELECTIVITY_COPY for read.
 3) We do an all-AMPs RETRIEVE step from
 INDEXING.NUSI_SELECTIVITY_COPY by way of an all-rows scan with a
 condition of ("INDEXING.NUSI_SELECTIVITY_COPY.COL_A = 'A '") into
 Spool 1 (group_amps), which is built locally on the AMPs. The
 input table will not be cached in memory but is eligible for
 synchronized scanning. The size of Spool 1 is estimated with high
 confidence to be 24,137 rows (989,617 bytes). The estimated time
 for this step is 0.81 seconds.
 4) Finally, we send out an END TRANSACTION step to all AMPs involved
 in processing the request.
 -> The contents of Spool 1 are sent back to the user as the result of
 statement 1. The total estimated time is 0.81 seconds.

I hope you enjoyed and gained valuable insights from this article.

Don’t forget to watch our video course on Teradata indexing techniques.

Also, check out:
The Teradata USI compendium
The NUSI compendium: Teradata’s Secondary Index

Featured Image without Sidebar

Take a brief survey and receive the book for free Get instant access

Previous Article

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

Thank you very much. It is a big honor to hear this from Tera-Tom himself 😉

I am a big fan of your Teradata books I remember, the first one I bought was “Tera-Tom on Database Administration” for V2R6 which helped me a lot to better understand the Teradata architecture.

Thanks again,

Roland

Roland,

Another brilliant article bringing to light important information to know when one wants to become a Teradata expert. Thank you again for another home run!

Tom Coffing

You might also like

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes