Top Facts about Teradata NUSI Selectivity

2
692
teradata nusi selectivity

teradata nusi selectivity

The primary goal of this article is to show you how the number of rows per base table data block impacts the selectivity.

The example we are using will show you exactly how NUSI selectivity is related to the average data block size,  the average row size and the number of rows per data block.

First, we prepared an example table called NUSI_SELECTIVITY. It contained about 270.000 rows. PK is a unique number we used for even data distribution (as you can see, it is the Primary Index column) and COL_A was being used for row qualification:

CREATE MULTISET TABLE INDEXING.NUSI_SELECTIVITY
(
PK INTEGER,

COL_A CHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PK );

SELECT COUNT(*) FROM INDEXING.NUSI_SELECTIVITY
269.378

We created an NUSI on the column COL_A of this table:

CREATE INDEX (COL_A) ON INDEXING.NUSI_SELECTIVITY;

In our example table, we were using only fixed length data types, as this permitted us to calculate with a fixed row length. Each row in the example table occupied 14 bytes (the integer column PK  4 bytes, COL_A another 10 bytes).

We were populating the example table with below query (SOME_TABLE was only used to generate some random data):

INSERT INTO INDEXING.NUSI_SELECTIVITY
SELECT
  PK,
  CASE WHEN PK <= 19.332 THEN ‘A’ ELSE ‘B’ END
FROM
  INDEXING.SOME_TABLE;

We collected statistics on COL_A as we wanted to reproduce reliable results, not relying on random AMP sampling:

COLLECT STATS ON INDEXING.NUSI_SELECTIVITY COLUMN(COL_A);

The query we have been using for row qualification looked like this (‘A’ is the value we use for testing the selectivity):

SELECT * FROM INDEXING.NUSI_SELECTIVITY  WHERE COL_A = ‘A’

Over the course of our preparations (spending a lot of time), we determined that 19.332 is the maximum number of qualifying rows for this example table which allows for NUSI usage. We operated at the threshold where, with only one more record selected in our query, the optimizer switched from NUSI usage to a full table scan.

With 19.332 qualifying rows we got below explain output, showing clearly that the NUSI was used (“by way of index #4):

EXPLAIN SELECT * FROM INDEXING.NUSI_SELECTIVITY WHERE COL_A = ‘A’

1) First, we lock a distinct INDEXING.”pseudo table” for read on a
RowHash to prevent global deadlock for INDEXING.NUSI_SELECTIVITY.
2) Next, we lock INDEXING.NUSI_SELECTIVITY for read.
3) We do an all-AMPs RETRIEVE step from INDEXING.NUSI_SELECTIVITY by
way of index # 4 “INDEXING.NUSI_SELECTIVITY.COL_A = ‘A ‘” with no
residual conditions into Spool 1 (group_amps), which is built
locally on the AMPs. The size of Spool 1 is estimated with high
confidence to be 19,332 rows (676,620 bytes). The estimated time
for this step is 0.58 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.58 seconds.

If we only selected one more row (19.333 rows), the optimizer considered the NUSI usage too costly and was going for a full table scan instead.

DELETE FROM INDEXING.NUSI_SELECTIVITY;
INSERT INTO INDEXING.NUSI_SELECTIVITY
SELECT 
  PK,
  CASE WHEN PK <= 19.333 THEN ‘A’ ELSE ‘B’ END
FROM 
  INDEXING.SOME_TABLE;
COLLECT STATS ON INDEXING.NUSI_SELECTIVITY COLUMN(COL_A);

With 19.333 qualifying rows, the optimizer did an FTS on the base table (“by way of an all-rows scan”):

EXPLAIN SELECT * FROM INDEXING.NUSI_SELECTIVITY WHERE COL_A = ‘A’;

1) First, we lock a distinct INDEXING.”pseudo table” for read on a
RowHash to prevent global deadlock for
INDEXING.NUSI_SELECTIVITY.
2) Next, we lock INDEXING.NUSI_SELECTIVITY for read.
3) We do an all-AMPs RETRIEVE step from
INDEXING.NUSI_SELECTIVITY by way of an all-rows scan with a
condition of (“INDEXING.NUSI_SELECTIVITY.COL_A = ‘A ‘”) into
Spool 1 (group_amps), which is built locally on the AMPs. The
input table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 24,137 rows (1,086,165 bytes). The estimated
time for this step is 0.82 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.82 seconds.

As we found out the exact number of rows, where the optimizer switched from an NUSI usage to a full table scan, we tried to prove how to average data block size, average row size and rows per data block are influencing selectivity.

We expected that fewer rows per data block should work for the NUSI as the rows are distributed across more base table data blocks, requiring the FTS to move more data blocks to the FSG cache.

Before we go ahead with our example, let us determine the selectivity in percent. 19.332/269.378 gives us about 7.1%.

As long as we select less than 7.1% of the rows of our example table NUSI_SELECTIVITY, the optimizer will use the NUSI on column COL_A.

Let us move on now to the more interesting part of our case. We will prove that selectivity depends on the number of rows fitting into one data block. As data blocks are the minimum units which can be moved from disk to FSG cache, IO considerations done by the optimizer are based on data blocks and not on row counts!

One divided by the number of rows per data block is the limit between FTS and NUSI usage because this means (assuming even row distribution) that on average each data block contains one qualifying row. In this case, an NUSI would need to touch all data blocks anyway so that an FTS would be the better choice.

What we did next is, that we created a copy of our example table called SELECTIVITY_COPY and increased the datatype of COL_A from CHAR(10) to CHAR(20). As a result, we can store fewer rows per base table data block:

CREATE MULTISET TABLE INDEXING.NUSI_SELECTIVITY_COPY ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
PK INTEGER,
COL_A CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC)
UNIQUE PRIMARY INDEX ( PK );

We expected a shift in NUSI usage as the optimizer should be able to use the new table with weaker selectivity.

Similar to the first part of our example, we calculated the number of rows which separate FTS from NUSI usage. We found out that with COL_A being of data type character(20) we were able to select up to 24.136 rows before the optimizer switched to an FTS! The selectivity increased to 9%.

As expected, selectivity shifted as the number of rows per data block decreased.

I was able to create similar setups from above example, where NUSI selectivity was as low as 0.1% and as high as 40%!
It’s clear that the rule-of-thumb, written everywhere over the internet, namely that the NUSI selectivity is given up to about 10% is wrong.

Note, that even switching the character set from Latin to Unicode would cause a shift in selectivity  (as Unicode characters need 2 bytes, latin characters only one byte).

The important lesson here is:  Changing the data type of the table alone can lead to a shift in your index usage landscape!

As an additional teaching, I would like to show you another example, where the outcome was surprising:

For this example, we used our second example table NUSI_SELECTIVITY_COPY, but any other table would have worked as well. Just keep in mind that the table contained one record too much for an NUSI usage.

This additional example is about coverage. We ran our query again, but we only selected  col_a, basically making the NUSI  covering:

EXPLAIN SELECT COL_A FROM INDEXING.NUSI_SELECTIVITY WHERE COL_A = ‘A’;

We assumed, that switching to a covering query should have resulted in a cost reduction (as no base table lookup via the ROWID has to be done) and should have weakened the optimizer selectivity requirements again. As our example table was exactly at the limit between NUSI usage and FTS, our idea was that shift, only by one record, should have introduced the NUSI usage.

But as the execution plan revealed, coverage did not have any impact on NUSI usage, and we still had an FTS! I consider this a surprising result as even the Teradata Documentation tells us the myth that coverage will increase the chance of an NUSI usage…maybe this is valid for other releases of Teradata. I did my tests on TD 14.10:

 

EXPLAIN SELECT COL_A FROM INDEXING.NUSI_SELECTIVITY_COPY WHERE COL_A = ‘A’

1) First, we lock a distinct INDEXING.”pseudo table” for read on a
RowHash to prevent global deadlock for
INDEXING.NUSI_SELECTIVITY_COPY.
2) Next, we lock INDEXING.NUSI_SELECTIVITY_COPY for read.
3) We do an all-AMPs RETRIEVE step from
INDEXING.NUSI_SELECTIVITY_COPY by way of an all-rows scan with a
condition of (“INDEXING.NUSI_SELECTIVITY_COPY.COL_A = ‘A ‘”) into
Spool 1 (group_amps), which is built locally on the AMPs. The
input table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 24,137 rows (989,617 bytes). The estimated time
for this step is 0.81 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.81 seconds.

So much for NUSI selectivity.

I hope you enjoyed it and could get a lot of information out of this article!

Please don’t forget to watch our video course about indexing techniques on Teradata:

Thanks.

See also:
The Teradata USI compendium
The Secondary Index in Teradata – The NUSI compendium

Our Reader Score
[Total: 6    Average: 5/5]
Top Facts about Teradata NUSI Selectivity written by Roland Wenzlofsky average rating 5/5 - 6 user ratings

2 COMMENTS

  1. Thank you very much. It is a big honour to hear this from Tera-Tom himself 😉

    I am a big fan of your Teradata books I remember, the first one i bought was “Tera-Tom on Database Administration” for V2R6 which helped me a lot to better understand the Teradata architecture.

    Thanks again,

    Roland

LEAVE A REPLY

Please enter your comment!
Please enter your name here