Category Archives for "Indexing"

Top Facts about Teradata NUSI Selectivity

The primary goal of this article is to show you how the number of rows per base table data block impacts the selectivity.

The example we are using will show you exactly how NUSI selectivity is related to the average data block size,  the average row size and the number of rows per data block.

First, we prepared an example table called NUSI_SELECTIVITY. It contained about 270.000 rows. PK is a unique number we used for even data distribution (as you can see, it is the Primary Index column) and COL_A was being used for row qualification:




We created an NUSI on the column COL_A of this table:


In our example table, we were using only fixed length data types, as this permitted us to calculate with a fixed row length. Each row in the example table occupied 14 bytes (the integer column PK  4 bytes, COL_A another 10 bytes).

We were populating the example table with below query (SOME_TABLE was only used to generate some random data):

  CASE WHEN PK <= 19.332 THEN ‘A’ ELSE ‘B’ END

We collected statistics on COL_A as we wanted to reproduce reliable results, not relying on random AMP sampling:


The query we have been using for row qualification looked like this (‘A’ is the value we use for testing the selectivity):


Over the course of our preparations (spending a lot of time), we determined that 19.332 is the maximum number of qualifying rows for this example table which allows for NUSI usage. We operated at the threshold where, with only one more record selected in our query, the optimizer switched from NUSI usage to a full table scan.

With 19.332 qualifying rows we got below explain output, showing clearly that the NUSI was used (“by way of index #4):


1) First, we lock a distinct INDEXING.”pseudo table” for read on a
RowHash to prevent global deadlock for INDEXING.NUSI_SELECTIVITY.
2) Next, we lock INDEXING.NUSI_SELECTIVITY for read.
way of index # 4 “INDEXING.NUSI_SELECTIVITY.COL_A = ‘A ‘” with no
residual conditions into Spool 1 (group_amps), which is built
locally on the AMPs. The size of Spool 1 is estimated with high
confidence to be 19,332 rows (676,620 bytes). The estimated time
for this step is 0.58 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.58 seconds.

If we only selected one more row (19.333 rows), the optimizer considered the NUSI usage too costly and was going for a full table scan instead.

  CASE WHEN PK <= 19.333 THEN ‘A’ ELSE ‘B’ END

With 19.333 qualifying rows, the optimizer did an FTS on the base table (“by way of an all-rows scan”):


1) First, we lock a distinct INDEXING.”pseudo table” for read on a
RowHash to prevent global deadlock for
2) Next, we lock INDEXING.NUSI_SELECTIVITY for read.
3) We do an all-AMPs RETRIEVE step from
INDEXING.NUSI_SELECTIVITY by way of an all-rows scan with a
condition of (“INDEXING.NUSI_SELECTIVITY.COL_A = ‘A ‘”) into
Spool 1 (group_amps), which is built locally on the AMPs. The
input table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 24,137 rows (1,086,165 bytes). The estimated
time for this step is 0.82 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.82 seconds.

As we found out the exact number of rows, where the optimizer switched from an NUSI usage to a full table scan, we tried to prove how to average data block size, average row size and rows per data block are influencing selectivity.

We expected that fewer rows per data block should work for the NUSI as the rows are distributed across more base table data blocks, requiring the FTS to move more data blocks to the FSG cache.

Before we go ahead with our example, let us determine the selectivity in percent. 19.332/269.378 gives us about 7.1%.

As long as we select less than 7.1% of the rows of our example table NUSI_SELECTIVITY, the optimizer will use the NUSI on column COL_A.

Let us move on now to the more interesting part of our case. We will prove that selectivity depends on the number of rows fitting into one data block. As data blocks are the minimum units which can be moved from disk to FSG cache, IO considerations done by the optimizer are based on data blocks and not on row counts!

One divided by the number of rows per data block is the limit between FTS and NUSI usage because this means (assuming even row distribution) that on average each data block contains one qualifying row. In this case, an NUSI would need to touch all data blocks anyway so that an FTS would be the better choice.

What we did next is, that we created a copy of our example table called SELECTIVITY_COPY and increased the datatype of COL_A from CHAR(10) to CHAR(20). As a result, we can store fewer rows per base table data block:


We expected a shift in NUSI usage as the optimizer should be able to use the new table with weaker selectivity.

Similar to the first part of our example, we calculated the number of rows which separate FTS from NUSI usage. We found out that with COL_A being of data type character(20) we were able to select up to 24.136 rows before the optimizer switched to an FTS! The selectivity increased to 9%.

As expected, selectivity shifted as the number of rows per data block decreased.

I was able to create similar setups from above example, where NUSI selectivity was as low as 0.1% and as high as 40%!
It’s clear that the rule-of-thumb, written everywhere over the internet, namely that the NUSI selectivity is given up to about 10% is wrong.

Note, that even switching the character set from Latin to Unicode would cause a shift in selectivity  (as Unicode characters need 2 bytes, latin characters only one byte).

The important lesson here is:  Changing the data type of the table alone can lead to a shift in your index usage landscape!

As an additional teaching, I would like to show you another example, where the outcome was surprising:

For this example, we used our second example table NUSI_SELECTIVITY_COPY, but any other table would have worked as well. Just keep in mind that the table contained one record too much for an NUSI usage.

This additional example is about coverage. We ran our query again, but we only selected  col_a, basically making the NUSI  covering:


We assumed, that switching to a covering query should have resulted in a cost reduction (as no base table lookup via the ROWID has to be done) and should have weakened the optimizer selectivity requirements again. As our example table was exactly at the limit between NUSI usage and FTS, our idea was that shift, only by one record, should have introduced the NUSI usage.

But as the execution plan revealed, coverage did not have any impact on NUSI usage, and we still had an FTS! I consider this a surprising result as even the Teradata Documentation tells us the myth that coverage will increase the chance of an NUSI usage…maybe this is valid for other releases of Teradata. I did my tests on TD 14.10:


1) First, we lock a distinct INDEXING.”pseudo table” for read on a
RowHash to prevent global deadlock for
2) Next, we lock INDEXING.NUSI_SELECTIVITY_COPY for read.
3) We do an all-AMPs RETRIEVE step from
INDEXING.NUSI_SELECTIVITY_COPY by way of an all-rows scan with a
condition of (“INDEXING.NUSI_SELECTIVITY_COPY.COL_A = ‘A ‘”) into
Spool 1 (group_amps), which is built locally on the AMPs. The
input table will not be cached in memory, but it is eligible for
synchronized scanning. The size of Spool 1 is estimated with high
confidence to be 24,137 rows (989,617 bytes). The estimated time
for this step is 0.81 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.81 seconds.

So much for NUSI selectivity.

I hope you enjoyed it and could get a lot of information out of this article!

Please don’t forget to watch our video course about indexing techniques on Teradata:


See also:
The Teradata USI compendium
The Secondary Index in Teradata – The NUSI compendium


NUSI – The Nonunique Secondary Index in Teradata

The Technical Details of the Non-Unique Secondary Index in Teradata – NUSI

Many concepts of Teradata indexing are tightly coupled to the physical storage of data rows. Therefore, I highly recommended reading our article series about physical storage:

The Teradata Non-Unique Secondary Index (NUSI) is an alternative data access path which can help to reduce IOs.

While B+ Tree Indexes are a quite commonly used index mechanism in relational database systems, Teradata uses a different approach based on hashing.

Teradata distinguishes between unique secondary indexes (USI) and non-unique secondary indexes (NUSI). Although the naming may seduce to assume they are technically the same, they are not at all. This article is related to the non-unique Teradata secondary index or NUSI.

As opposed to the USI, index row distribution of Non-Unique secondary indexes is not based on the same hashing algorithm Teradata uses to distribute base table rows evenly across all AMPs

Non-unique secondary indexes are sub-tables; they do not show up as real objects in the DBC data dictionary tables (in contrast to Join Indexes).

Base table rows are hashed by the primary index column values, while the nonunique secondary index rows are co-located with their base table rows on the same AMP:

NUSI rows are not hash distributed.

While the USI-access is always a 2-AMP operation (on rare occasions even a single AMP operation), NUSI access typically requires an All-AMP access of the index subtable.

Only if the NUSI consists of the same columns as the Primary Index of its base table, the index rows are searched on one AMP (the Teradata Optimizer knows in this case that only one AMP contains the rows of the base table and the index subtable).

Each NUSI row contains the ROWIDs of the related base table rows and to look up the base table row if the index is not covering.

NUSI index rows are not hash distributed, NUSI access always is an All-AMP access to the NUSI sub-table. In case the Teradata optimizer decides to use an existing NUSI instead of doing an FTS on the base table (based on cost considerations), but the NUSI index is not covering, the following approach is taken (on all AMPs at the same time):

At first,  the AMPs copy qualifying index rows into an AMP-local spool. This spool file contains the extracted base table ROWIDs.
Next, this spool file is used to access the base table rows.

Hash ordering versus value ordering

The rows in a Teradata NUSI sub-table can be ordered by ROWHASH or another 4-Byte Integer value, depending on intended use.

If you need to retrieve a range of dates, a value ordered NUSI on the data column makes sense (dates can be utilized, as Teradata stores them as 4-byte integer values). Before the availability of join indexes and partitioning, this was the only method to access ranges of dates in an optimized way.

If you have equality WHERE conditions on your query columns and the same columns in your NUSI, row hash ordering is probably the way to go. In this case, the AMPs can look up the searched value by doing a binary search on the NUSI index subtable rows which is the most efficient way of finding a data row.

Value ordering can only be used on exactly one 4-byte integer column. The reason for this limitation is the data block design and how rows are stored in a data block:

Teradata Secondary Index

Inside the data block, only the array of pointers to the start of each row is kept in order while rows are inserted wherever sufficient space is available.
Rows are not kept sorted as this would require permanent defragmentation activity.

In the case of row hash ordering, each entry in the row pointer array contains the row hash value (plus uniqueness value) of the related row. The row hash sorts the array. If the index is value ordered, each pointer array entry contains the value which is used for ordering, and this value sorts the array.

By design, the pointer array can hold the 4-byte row hash or any other 4-byte integer value. Without this limitation, the block structure could not be fixed as the array values stored could be of arbitrary length. For example, imagine you could use a character (2000) column for value ordering, in this case, you would need 2000 bytes space for each entry in the pointer array.

Any data type which is stored internally as an integer of 4 bytes can be used for value ordering, including for example INTEGER, DATE, DECIMAL (n, 0) where n occupies a maximum of 4 bytes.

Having accurate statistics is important. Any row qualification condition on the NUSI columns has to be highly selective. Otherwise, the optimizer is not going to use the NUSI. In the absence of collected statistics, the optimizer will do sample statistics, but the cost advantage for an NUSI usage is estimated more conservative in this case. On the other hand, table skewing may lead to the inappropriate use of the NUSI, even increasing the costs of the data access.

Requirements for a Teradata NUSI to be considered by the Optimizer

Teradata documentation says that if more than 10% of the table rows are qualifying, an existing NUSI is not used – this is only a rule of thumb. I have seen NUSIs being used where as much as 35% of the rows qualified and I have seen NUSIs being avoided by the optimizer although less than 1% of the table rows qualified.

There are other measures available to make better estimations about index usage:

The optimizer will only use a NUSI if the costs of accessing the qualifying rows in the index sub-table plus the costs of reading the base table data blocks containing the qualifying rows together are less than the expenses of a full table scan on the base table. There are additional considerations to be done in case several NUSI indexes are combined, or NUSI Bitmapping is applied.

The optimizer estimates the number of base table data blocks accessed (without index usage) based on the estimated qualifying rows for the considered index value, the data block size, and the row length. The qualifying rows are based on statistics (collected or sampled).

Let’s assume we have an average data block size of 1.000 bytes and we store rows with an average row length of 100 bytes. On average, we can store ten rows per data block.

Only if less than 10% (1 out of 10) of the rows qualify, the NUSI access will be considered efficient by the optimizer. Otherwise, a full table scan will be done. You may ask yourself, why? Because, going back to our example,  if we have to select more than 1 row out of 10, we are on average reading about one row per data block which is equal to a full table scan.

In our example, if we have less than 10% qualifying rows, the NUSI will be considered by the optimizer as not all base table data blocks will contain qualifying rows, but definitely, would have to be read in case of a full table scan.

Data blocks on the base table can be binary searched as rows are stored in row hash order. A full table scan directly on the base table would require checking all rows in all blocks sequentially!

Let us now take a look at some practical examples to carve out in which case a NUSI will be considered by the optimizer (always assuming the needed statistics are available and selectivity is strong):

Example Index on columns Equality WHERE condition Index Usage
1 {A, B} A Never used
2 {A,B} A, B Used if strong selectivity is given
3 {A,B} A,B,C Used if strong selectivity is given
4 Two separate NUSI, AND combined: {A},{B} A = ‘value1’ AND B = ‘value2’ Only one NUSI will be used (the one with the stronger selectivity).
5 Two separate NUSI, OR combined: {A},{B} A = ‘value1’ OR B = ‘value2’ Both NUSI will be used if strong selectivity is given
6 {A} LIKE ‘%value%’ Used if strong selectivity is given( NUSI subtable scan is done, no binary search)

May not seem obvious at first, but there is a good reason why the optimizer never decides to use the NUSI. We just have to remember how rows are stored in the index sub-table data blocks. A hash ordered NUSI would store the array of row pointers sorted by row hash value.  As we only have a constraint on column A, we cannot use the index hash value of {A, B} to execute a binary search on the index data blocks. Only a full table scan on the index sub-table would be available, but the optimizer never considers this an option.Example 1:

Example 2:

Example 2 is a perfect setup for the NUSI usage. Assuming that selectivity is strong, the optimizer will use the NUSI. The Row hash of the index matches the row hash of the WHERE condition columns {A, B} which allows the calculation of the hash value of the WHERE condition. By knowing this hash value, the rows can easily be looked up in the index sub-table applying a binary search on the row pointer array.

Example 3:

Not as perfect as Example 2, but still worth to be considered by the optimizer: only columns {A, B} of the WHERE condition are used for hash value calculation i.e. again row hash values match with the row hash values of the NUSI sub-table.

A binary search on the row pointer array is executed, the remaining qualifier on column C is applied as the residual condition.

Example 4:

Two or more NUSI which are combined with AND may be used together in the following way (at least one has to have strong selectivity):

If several NUSI is available, the optimizer will choose exactly one of them, namely the NUSI with the greatest selectivity. Nevertheless, there is another method of combining several NUSI with weak selectivity: NUSI Bitmapping. We will talk about this soon.

Example 5:

Two or more NUSI combined with OR (all of them need high selectivity) may be used together in the following way:

The ROWIDs are retrieved from index {A} and index {B} and moved into a common spool. ROWID duplicates are removed from the spool; this requires a spool sort step. If the index is not covering, base table rows are accessed via the ROWIDs.

Some important remarks:

(1) The optimizer considers the sort step mentioned above as very costly. Even if selectivity is high, if many rows have to be sorted, the optimizer may reject NUSI usage.

(2) If you have two WHERE conditions which are OR combined, but only a NUSI on one of them, the optimizer will always choose a full table scan (which should be obvious, as the WHERE condition without NUSI anyway requires an FTS, there is no value in accessing the available NUSI)

(3) If the optimizer rejects the usage of two NUSIs because it considers the sort step too costly, the query can be rewritten to enforce the usage of both NUSI:

SELECT * FROM table WHERE colA = ‘value1’ OR colB = ‘value2’

can be rewritten to:

SELECT * FROM table WHERE colA = ‘value1’
SELECT * FROM table WHERE colB = ‘value2’

If both NUSIs have high selectivity, the optimizer will use them (as the sort step has moved to the “execution logic” and is not any more input to the optimizer cost estimation).

Example 6:

A full table scan accesses the index rows on the index sub-table. No binary search on the data blocks can take place; each block has to be checked (therefore full table scan). Nevertheless, the optimizer may use the index if it assumes the index sub-table FTS to take place on a much smaller table than the base table might be. Be aware, only like ‘x%’ queries will work as the optimizer can derive estimations from available statistic histograms on the index column. Something like ‘%x%’ does not operate as no statistics can be calculated.

The Value Ordered NUSI

We were already talking shortly about value ordering. A value ordered NUSI is especially useful for range scans, as the retrieval of rows in a range of dates. Master Index and Cylinder Index hold information about all NUSI data blocks containing qualifying rows. The search within a NUSI data block can be stopped as soon as the looked up value in the row pointer array moves outside the searched value;  typically is an FTS on the NUSI subtable on a limited range within the data blocks.

NUSI Bitmapping

The optimizer can use NUSI bitmapping to combine several NUSI with weak selectivity. Bitmapping is a proprietary algorithm which comes with its cost overhead, making usage for us even less predictable.  NUSI bitmapping only is available if more than one WHERE condition is combined with AND. NUSI Bitmapping is used if several NUSI with weak selectivity can be combined and the combined usage results in high selectivity.

Let me give you an example: Consider a table has two columns:

The first column is the GENDER, and the table contains 50% men and 50% women. A NUSI on gender probably would have weak selectivity.
The second column is a “has beard” flag. It is a flag with two manifestations “yes” or “no.” Let’s assume most of the men in our example table have a beard. A NUSI on “has a beard” probably has weak selectivity.

None of both NUSI alone would be used by the optimizer as an FTS would probably be cheaper.

Now, as you may know, we have a song contest in Europe. This time a lady with a beard won the song contest. As you can imagine, this is a very rare combination. If we would look up this person in our table, both NUSI could be combined, and together they would have high selectivity.

Index Coverage / The Index-Only access

NUSI full or partial coverage is possible. Full coverage means that no access to the base table rows is needed to fulfill the request. Otherwise, base table rows have to be extracted and a spool is used to access the base table rows in a second step. NUSI full coverage is given if any of the following prevails:

  • No reference to columns other than those in the NUSI
  • NUSI includes all columns in the base table referenced and upon reference to a character column is has to be either NOT CASESPECIFIC or UPPERCASE

The usual approach for testing a NUSI is:

Create the NUSI you would like to be used and check if it’s indeed used (EXPLAIN statement says something like “by traversing index #1” or “by way of index #1”). If it is used keep it, otherwise drop it (remember, secondary indexes are physical tables occupying disk space).

What is the appropriate Teradata NUSI Workload?

A USI allows direct access to the data records (I try to avoid the term “data rows” in times of column stores) and is appropriate for workload where direct access is an advantage (for example: looking up a single customer). NUSI access seems more useful than USI access in cases you need to access many data rows at once. The recommended workload is, therefore, decision support. OLTP environments tend to include a load of DML operations causing substantial index maintenance.

Teradata Load Utilities and the NUSI

It is not possible to load into a table with the Fastload utility if there is a NUSI defined on the table. The usual solution is to drop the NUSI before loading and recreating if afterward. This restriction is fine as Fast loads can only be run against empty tables anyway. On the other hand, Multiloads are compatible with a NUSI. You do not need to drop the NUSI before loading. Nevertheless, from a performance point of view, you may consider the same drop and recreate approach like for a Fastload.

More on indexing here:
Teradata Indexing
The Primary Index Choice


The Teradata Partitioned Primary Index (PPI) Guide


Even though solid state disks (SSDs) are increasingly replacing hard disks, access to mass storage will always be the slowest process in a database system. Therefore, all database providers search for methods to reduce the number of I/Os.
As of Release 14.10, Teradata provides functions for reducing I/Os such as column storage, block-level compression or Teradata Intelligent Memory.

These new features will definitely help to improve performance, but we would like to point you to a mature method that meets your improvement needs in equal measure:

The partitioned primary index (PPI)

What is a Partitioned Primary Index?

Commonly queried rows are stored together in partitions so that you can restrict data retrieval to those partitions that contain the rows required by your SQL statement. Other partitions that do not contain requested rows are not read.

Incidentally, other database systems, such as Netezza, go the other way round by storing the information about the data blocks where the requested rows are not stored. Netezza called this method “Data Skipping”.

Partitioning will not help to avoid table scans, but it will reduce the number of data blocks that each AMP must move to its local memory (FSG cache) because only relevant partitions need to be accessed.

How is Teradata Partitioning implemented?

The idea behind the Teradata Partitioned Primary Index is to keep the records of a table packed together in clearly delineated containers of which there are not too many in total, about the size of the table. In that sense, Partitioning is just a different way of structuring the data records on the disks.

In the case of partitioned tables (PPI Tables), the Primary Index still determines the responsible AMP of each row. Only the way the rows are stored in the data blocks is different: While not partitioned tables store their rows sorted by rowid only, partitioned tables (PPI tables) store rows at first inside the corresponding partitions and only afterward sorted by the rowid.

Whenever a data record is accessed, the cylinder index of the AMP is queried to find out on which cylinder the first data block of the accessed partition is located. After positioning at the first data block, all remaining data blocks of the partition (or subpartitions in the case of multilevel partitioning) can be fetched into the FSG cache by scanning in a sequence with a minimum number of disk IO’s.

It is a remarkable technical detail that Teradata internally considers all tables to be partitioned. NPPI tables are nothing else than PPI tables with exactly one partition, namely partition number zero,  containing all table rows.  You should always collect statistics on the dummy column PARTITION, even for not partitioned tables, as this is used by the optimizer to estimate the table cardinality of NPPI tables.

The process of accessing chunks of data along the partitioning attributes is often called partition elimination. This convention might lead you to believe that an aforementioned excluding approach is in place when in fact it is more of a “partition picking” that takes place.

You may wonder in which way partitioning is different from indexing techniques on Teradata.

Maybe the most notable difference is that all Teradata index types consume permanent space. Any Teradata index is a sub-table redundantly storing a subset of columns of its related base table.

While we still experience an impact on the occupied permanent disk space, this is not related to the redundant storage of columns but caused by the overhead of storing the records in a partitioned way: PPI table rows are 2 to 8 bytes wider than the equivalent not partitioned table row. The extra bytes are used to store the internal partition number of each row.

Traditionally, one of the primary applications for a PPI is partitioning by a date. Most reports have a date dimension as part of their reporting filters. Consider a call detail table in telecommunication companies. These are huge tables holding detailed call information about each single call for a particular time frame. Analytics take place directly on this one big table, and often along with joins to other tables are not mandatory in providing the query result.

Imagine a call detail table containing a full year of data on a daily level, but you only need one day in your report result dataset. If you partition this table by date, each AMP  can restrict access exactly to this one date i.e. only 1/365 of the rows have to be moved from the disks into the AMPs memory (FSG cache).

Partitioning should always be considered over the use of other index types if ranges of data records have to be accessed together. Typical workloads are aggregations (“sum up all account balances for my business segment” etc.) which you often find in reporting requirements.

On the other hand, certain indexing techniques that allow direct access to the data blocks (primary index, unique secondary index, join index with matching primary index definition, etc.) are usually right for OLTP applications (“Give me the customer name for the client where customer id = 123”).

As you can combine partitioned tables with other index types, you can improve usability even further.

Until Teradata Release 12, several restrictions existed if you wanted to partition a table, especially in the context of data types. There was no possibility to partition character columns and timestamps. At least there was a workaround for character columns:


With Teradata Release 13.10, character and timestamp column partitioning is possible in a direct way.

Even the system variable CURRENT_DATE can be used in the partitioning statement for timestamps:


Using CURRENT_DATE in your partition statement makes it easy to increase your partitions up to the most current date:

— Delete the rows not matching any partition

— Insert the rows not matching into a backup table

There is three kind of partitioning supported: A simple partitioning by column, the CASE_N syntax, and the RANGE_N syntax.

As the names already indicate, RANGE_N allows you to create ranges of rows which will end up in the same partition and CASE_N allows you to use the simple CASE … WHEN statement.

Here is one example of an often used RANGE_N partitioning by date:


As you can see from the above example, there exist two distinct partitions which you can add to each table:

All rows not matching any of the defined partitions will end up in the NO RANGE bucket. If you do not specify a UNKNOWN partition and try to insert records not matching any specified partition, you will receive an error message, and the insert will fail.

The UNKNOWN partition is used to hold rows with NULL values in the partition expression.

Here is one example for a CASE_N partitioning:

PARTITION BY CASE_N (MyValue < 1000 MyValue < 2000, MyValue < 3000, NO RANGE, UNKNOWN);

Again, we can add the two special partitions as required.

CASE_N partitioning should be utilized if you want to group values together into partitions.

Partitioning is not restricted to one level. Multilevel Partitioning allows the partitioning of tables on more than one level. Here is one example:

CASE_N (MyValue < 1000 MyValue < 2000, MyValue < 3000, NO RANGE, UNKNOWN),

Just keep in mind that some partitions are limited. Up to release 13.10 of Teradata, the maximum number of partitions was 65535. Pay attention to the fact that this is the number of combined partitions in the case of multi-level partitioning and don’t forget to count the NO RANGE and UNKNOWN partitions as well. You have to multiply the number of partitions on each level to get the overall number of partitions in use. Starting with Teradata 14.10 it is possible to have more than 9.000 Quintillion partitions.
When does Teradata partition elimination take place?

Many misconceptions exist about when partition elimination takes place.

First, you have to distinguish between partition elimination and primary index access. Partition elimination occurs independently of any involvement of the Primary Index in a query.

Although the primary index choice plays a significant role in partition decisions, most of the time it is the elimination of partitions that accounts for the vast leap in IO reduction.

These are the possible performance patterns for partitioned primary index access. They are dependent on how PI and Partitions are referenced:

teradata ppi

Furthermore, for multi-level partitioning, you do not have to include all partitions in the WHERE condition to be able to cut partitions. Each of the partitions can be addressed independently, which is indeed a very nice feature in Teradata.

Now for the limitations of the primary index choice in the context of partitioning.

Many of us have heard about a performance difference between tables including all partition columns in the primary index and tables that do not include the partition columns in the primary index. Also, if tables are not including the partitioning columns, the primary index cannot be unique. What’s behind all this?

To understand the relation between primary index uniqueness and partitioning, we have to recall how rows are stored on the disks:

In the first step, the rows are hashed to the responsible AMPs. In the second phase, they are inserted into the proper partitions and sorted by the ROWID.

Imagine at first a table with a non-unique primary index: Many rows can have the same primary index and will hash to the same AMP. However, each of these rows could belong to a different partition.

In the case of a not partitioned tables, all rows with the same hash are stored together, in the event of a partitioned table the rows are scattered across different partitions.

Now imagine it was allowed to have a partitioned table with a unique primary index and without the partition column being part of the primary index. Any update or insert statement would need Teradata to check each partition to avoid the creation of duplicates. Having to check each partition is very inefficient from a performance point of view.

Also, keep in mind that if the primary index is not including the partitioning columns, each time a primary index access is required, the responsible AMP has to scan all its partitions for this particular primary index. The scan of all partitions can be avoided if you include the partition columns into the primary index. I consider this a minor problem, more related to OLTP.

Table partitioning has other impacts on performance as well. Especially when joining a non-partitioned and a partitioned table together, Teradata has to apply different join techniques for non-partitioned tables. A Sliding Window merge join is one of the join techniques related to partitioning. Another option is to “de-partition” the partitioned table and doing a regular merge join or to partition the table which is not partitioned.

Similar to the joining of two non-partitioned tables, from a performance point of view, it is best to have the same primary index on both tables, and a join is taking place on all primary index columns and additionally on the partition columns. Anything else requires less efficient join techniques.

Hence, while the elimination of partitions may significantly reduce disk IO’s you have to keep an eye on its join performance impact.

Some more benefits arise when using partitioned tables: The records of complete partitions can be removed from the table with a simple statement and, more important, without involving the transaction log:


Finally, here are the “best practice” statistic recommendations for partitioned tables:

  • Table PI
  • Partitioning column(s)
  • Columns in any not partitioned table that are equated to the partitioning column of the partitioned table
  • The system-derived PARTITION dummy column of all partitioned tables

I hope I could convince you that I consider partition elimination an important tool in the toolkit for performance optimization.

More on indexing & partitioning here:
Teradata Indexing
The Secondary Index in Teradata – The NUSI compendium
Teradata Golden Tuning Tipps 2017 – Take your Skills to the next Level!

If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.

Teradata Quiz
Teradata Quiz
Developer: Roland Wenzlofsky
Price: Free

The Teradata Hash Index Guide


teradata hash index

As it is valid for all Teradata index structures, the goal of the Teradata Hash Index is to minimize disk IOs by offering an alternative access path to the rows.

We can achieve this goal, by using three main effects:

  1. The storage of a vertical subset (columns) of the table in the hash index structure
  2. The selection of a better Primary Index to avoid costly redistribution activities for join preparation.
  3. The Ordering of rows by value, to support range scans

The Teradata Hash Index can be designed to cover all selected columns, or it may only partially cover the columns.

In the first case, base table access can be skipped altogether, but if the Hash Index is not covering the selected columns, the base table rows can be obtained in a subsequent step.  Each index row carries the ROWID, which is pointing to the related base table row (The ROWIDs from the Hash Index are extracted into a spool, in a subsequent step the spool is used to access the base table rows).

The details above seem similar to the way Teradata Secondary Indexes are designed. But as Hash Indexes allow to define their own Primary Index, rows can be distributed across the AMPs in a way which fits your workload requirements. One can’t do this with Secondary Indexes.

Similar to a Non-Unique Secondary Index (NUSI), the rows of a hash index can be stored on the same AMP like the related base table row, simply by choosing the Primary Index of the base table as the Primary Index of the hash index.

As valid for any index, Teradata Hash Indexes are maintained automatically by the system, and there is overhead involved in the case of UPDATE, INSERT and DELETE statements as the base table, and the index structures have to be maintained.

Statistics are crucial and have to be collected at least on the Primary Index columns of the Hash Indexing to allow the Teradata Optimizer to estimate the costs of the Hash Index usage correctly.

Although the Teradata Hash Index can be viewed as a specialized form of a Single Table Join Index (STJI) – there are a lot of features in common – some differences exist which should be considered when having the choice between the usage of a Single Table Join Index and the Hash index.

Maybe the most outstanding limitations compared with a Single Table Join Index are:

– A Hash Index cannot have a Partitioned Primary Index
– A Hash Index cannot have a Non-Unique Secondary Index.
– Hash Indexes cannot be specified for NOPI or column‑partitioned base tables as they are designed around the Teradata hashing algorithm.
– A hash index cannot be column partitioned
–  A hash index must have a Primary Index; a Single Table Join Index can be created with or without a primary index if a table is column-partitioned (as column stores on Teradata never have a Primary Index)

Here is an example for creating a Hash Ordered Hash Index:


Similarly, you can create a Value Ordered Hash Index, which might be useful for retrieving rows containing ranges of values in the considered column(s):CREATE HASH INDEX HASH_IDX (COL1, COL2) ON MYTABLE
BY (COL1) ORDER BY VALUE (COL2);In both cases, the “by” clause defines the columns used for data distribution (like the Primary Index), but keep in mind that the columns used for data distribution have to be part of the columns which make up the Hash Index column list.If we want to increase fault tolerance, FALLBACK protection can be used in the same way like it is used for base tables.

A second copy of each Hash Index row will be stored on a fallback AMP. In case the main AMP is down, and you don’t have fallback protection, you are not able to use the Hash Index anymore and updates on the base table are not allowed until the AMP is coming online again.

Nevertheless, keep in mind, that fallback protection doubles the space needed to store the Hash Index structure.

The conclusion is that Hash Indexes are a restricted form a Single Table Join Indexes and I do not assume a big difference in performance. As an advantage, the syntax for creating a Hash Index is simpler than the one used for a Single Table Join index.

More on indexing here: Teradata Indexing



The Teradata USI compendium

The Secondary Index in Teradata provides an alternate access path to the data records with the goal to reduce the disk IO’s while retrieving the data.

While B+ Tree Indexes are quite commonly used in database systems, Teradata uses a different approach.

Teradata distinguishes between unique secondary indexes (USI) and non-unique secondary indexes (NUSI). Although the naming may seduce to assume they are technically the same, they are not at all. This article is related to the USI.

Unique secondary indexes are based on the same hashing algorithm Teradata applies to distribute data evenly across all disks. Technically, unique secondary indexes are tables like any other table. However, they are so-called sub-tables which mean they do not show up as a real object in any dictionary table (in contrast to join indexes).

The base table data is hashed by the primary index column(s), whereas the secondary index rows are hashed by another column or combination of columns which presents the additional data path.

Each USI row contains the ROWID of the base table row,  making it possible to lookup the base table row directly.

Teradata USI access is most of the times a 2-AMP access: At first, the secondary index row is looked up. This row contains the ROWID of the base table which is used to lookup the base table row. Sometimes, if lucky, the USI row and the base table row are located at the same AMP,  making the retrieval of the base table row AMP-local:


The same as a unique primary index, the unique secondary index distributes evenly across the AMPs, and there is no need to take care about data skewing.

As always, Teradata needs recent statistics to consider index usage. A USI has to be highly selective. Otherwise, Teradata probably decides not to use it when building the execution plan. It is stated, that if more than 10% of the table rows would be selected from the USI, it will not be used. 10% is only a rule of thumb.

The usual approach is: create the USI you would like to be used and check if it is indeed used (EXPLAIN statement). If it is used, keep it, otherwise drop it (remember, unique secondary indexes are physical tables occupying disk space).

 Information about the Teradata USI you will find nowhere else:

The detailed data retrieval process

  • The Parsing Engine is consulting the hash map and locates the AMP carrying the USI row
  • The AMP containing the USI row finds the USI data block via the Master Index & the Cylinder Index
  • The USI data block is  moved into the FSG cache
  • The AMP does an in-memory binary search of the data block and locates the USI row
  • The designated row carries the ROWID of the base table, the Parsing Engine uses this ROWID to look up the base table AMP and instructs it to retrieve the base table row
  • The base table AMP locates the data block via Master Index & Cylinder Index lookup
  • The base table AMP moves the data block into the FSG cache
  • With an in-memory binary search on base table block, the base table row is located

teradata usi

See also:
The Secondary Index in Teradata – The NUSI compendium
The Teradata Join Index Guide – We leave no Questions unanswered!
The Teradata Partitioned Primary Index (PPI) Guide
Watch the Teradata Indexing Video Course


The Teradata Join Index Guide

 What is the Teradata Join Index (JI)?

The Teradata Join index stores pre-joined tables, the result of aggregations, or merely change a table to a different physical structure. No matter which of the mentioned activities lies at the heart of the Join Index, all of them are stored permanently on the disks.

Although the name indicates that join indexes are an index structure, they have to be considered more as an extra layer of pre-joined, pre-aggregated or permanent tables. Nevertheless, we can’t deny, that there are some similarities with other index structures.

In contrast to secondary indexes, which are stored as internal sub-tables, Join Indexes are saved as separate physical database objects. Therefore, a Join Index may be retained in whatever database you like.

As join indexes are stored similar to base tables, their maintenance cost is available in the DBC.DBQLOGTBL table (Teradata stores no information about maintenance cost of any other index type).

Join indexes cannot be peeped through. Also, their usage or circumvention is not subject to user decision. Only the Teradata Optimizer decides when it makes sense, – from a cost point of view –  to use the join index, and when it is cheaper to use the underlying table structures.

While real indexes are mostly limited to the tasks of being an additional access path to the data, join indexes can be designed in various ways and for different reasons.

When should we use a Teradata Join Index?

  • Join Indexes are Ideal for many joins of n middle to large tables with a significant number of rows from both tables.
  • Useful for many joins of large tables and a relatively small set of columns.
  • If you often run queries with complex expression in its predicate.
  • Join indexes help in de-normalization.
  • They are ideally suited for alternative partitioning.
  • Join indexes are very useful when alternate keys are used for joining.
  • Move vast and time-consuming joins or aggregations into the batch windows.
  • Join indexes even allow direct access via the primary index of the JI.

As join indexes are similar to real tables, you can add a secondary index, apply partitioning, etc., which aids in the further performance.

The Teradata Join Index and Partitioning

  • The Join Index can be partitioned
  • A Partitioned JI for a non-partitioned base table is ok, but non-partitioned JI on partitioned base table not recommended from a performance point of view
  • You cannot specify a partition for a row compressed Join Index

Full or partial Coverage of the Join Index

When the join index is not covering, it can lookup the base table row by the ROWID. A covering join index has all required columns and can directly satisfy the query without accessing the base table.


There is no full coverage for SELECT * FROM table_name queries.

Avoid the practice of writing SQL queries without specifying a column list or join index coverage is not possible!

The usage of a join index with partial coverage works in the following situations:

  • One of the columns in the join index definition is the keyword ROWID. In this case, the base table ROWID is staying with each index row, and a lookup of the base table row can be done.
  • The column set of the UPI of the underlying table is part of the definition. In this case, the column combination can be hashed, and the ROWID of the base table row can be derived
  • The column set of the NUPI of the underlying table is part of the definition plus either
    (a) one of the columns in the definition of that index is the keyword ROWID or
    (b) the column set defining a USI on the underlying base table.

For (a) like in the case of the USI coverage, the base table rows can be accessed via the ROWID

For (b), similar to the UPI case described above, just that the base table rows can be obtained indirectly by hashing the USI columns,                       extracting the base table ROWIDs from the USI into the spool and retrieving the base table rows by scanning this spool.

Coverage does not guarantee use of a Join Index. The cost of using must be estimated to be less than the cost of not using!

Restrictions on SQL Operations

  • FULL OUTER JOIN not allowed
  • LEFT or RIGHT JOINS: on the inner side at least one non-nullable column
  • OUTER JOIN preferable for Join Index usage likelihood, but not allowed for Aggregate JI
  • HAVING and QUALIFY not allowed
  • No Set Operations: UNION, INTERSECT, MINUS
  • No Sub queries
  • No inequality conditions for ON clauses in join index definitions. Supported if ANDed to at least one equality join condition
  • Only <, <=, >=, > as comparison operators in join allowed
  • TOP n and SAMPLE not allowed

The Teradata Multi-Table Join Index (MTJI)

Multi-Table join indexes give us the possibility to move resource consuming joins of tables from the online to the batch window. Moving workload will not cut the total system load, but shift it to a better time.

The syntax for a Multi-Table Join Index is similar to the one when we create a table from an SQL select statement:

SELECT t01.PK, t01.COL_A,t02.COL_B
<TABLE_A> t01
<TABLE_B> t02
t01.PK = t02.PK

Above example shows several important characteristics of a join index:

Of course, we could make the same result with a temporary table. But contrary to temporary tables, join indexes are hidden. While we could use a temporary table, the usage of the join index depends on the Optimizer and the availability of statistics.

What does this mean for us? Join Indexes will be used automatically by the system, the usage of a temporary table depends on us. If we can reach a point where Teradata reuses a join index on several occasions, it will improve performance. Otherwise, we probably should stay with temporary tables, offering full control of usage.

There is one significant disadvantage of Multi-Table join indexes: Often we create them to do pre-joining of tables (joins during query execution are expensive)

We may have a report that joins the same two tables together on a continuous base. If this report needs to process large amounts of data, it will cost both processing time and a lot of disk space.

If we want to overcome any possible deviation between indexed columns and the queried columns, it is feasible to store the pseudo-column ROWID in the join index which allows the usage of this index even if not all needed columns are available in the query.

Teradata will get the missing columns from the base table (as it has the ROWID available, this is the same as a primary index access).

All join indexes are maintained by Teradata when base table rows change, but base tables with join indexes can’t be loaded with any bulk load utility (MultiLoad, FastLoad).

This limitation is somehow conflicting to me: I expect to join indexes to be useful to avoid joins between big tables, but I can’t bulk load these tables?

Restrictions for outer-joined tables in the join index give some more headache. For example:

SELECT t01.PK, t01.COL_A,t02.COL_B
<TABLE_A> t01
<TABLE_B> t02
t01.PK = t02.PK

Each outer join  in a multi-table join index has to meet the following restrictions (the example does a LEFT OUTER JOIN, for a RIGHT OUTER JOIN, please switch left and right in the text below):

– All columns from the left table have to be selected
– At least one column from the right table requires the restriction “NOT NULL”.

Multi-Table Join Indexes can be compressed by putting brackets around a group of columns.  Compression reduces occupied disk space. Highly compressible column combinations are obviously good candidates. Such combinations are stored only once, but I could not find any details about the internal minutes of this approach.

SELECT t01.PK,( t01.COL_A,t02.COL_B)
<TABLE_A> t01
<TABLE_B> t02
t01.PK = t02.PK

The Teradata Single-Table Join Index (STJI)

The name is somehow misleading as no join is involved. Single-Table join indexes are created on single tables. Their main purpose is to have the same table available with a different primary index or to have a smaller table (the join index table) with fewer columns to cut IOs.

Let’s assume we have the following table below:


By creating the below join index, we now have physically stored the same table twice, with two different primary indexes:

SELECT t01.PK, t01.COL_A
<MyTable> t01

The join index on a different primary index gives the Teradata Optimizer an additional data access path.

The Teradata Aggregate Join Index (AJI)

Aggregate join indexes are pre-aggregations which are automatically maintained by Teradata. They can be based on single or multiple tables. Below is an example of a single-table aggregate join index:

<MyTable> t01

Aggregate join indexes are very limited in the aggregation functions which can be applied: SUM and COUNT and since Teradata Release 14.10 MAX and MIN are available as well.

Usability of the aggregate join index requires:

  • The grouping clause must include all columns in the grouping clause of the query.
  • The WHERE clause columns that join to tables, not in the aggregate join index must be part of the join index definition
  • For an aggregate join index with PPI, an aggregated column cannot be a partitioning column

The Teradata Optimizer will consider rewriting a query to make use of an existing AJI.
It will take into account an existing AJI in the following situations:

  • For the  coverage of queries containing a sub query (derived table spool)
  • For internally created distinct and grouping steps
  • To partly cover outer join queries
  • For partial/early grouping steps

The Teradata Sparse Join Index

Adding a WHERE condition to the join index that rules out some part of the table content turns a given index into a sparse join index. They allow us to cut the amount of data which has to be stored but requires queries to include the same WHERE condition. Otherwise, this join index will never be used.


From my experience, it ‘s hard to convince Teradata to use an individual join index, even with all relevant statistics collected.

Since they may be of great help in a slowly changing environment with simple queries and little need for change and invention of new reports, their potential to move ahead in processing time and resource consumption is limited in short-lived, dynamic reporting settings.

Furthermore, too many restrictions, for my taste. All this makes me prefer the real temporary tables in most cases, that also qualify for Join Indexes.

Be warned when experimenting with join indexes in a production environment. They may have unexpected adverse side effects. I will give you one real-life example:

Our daily batch load had a SQL step deleting a huge, 6+ billion records table:

DELETE FROM the_huge_table;

Not a big deal. The cylinder index is marking the related data blocks as free, and that’s it. No transaction log involved at all. The step normally finished in less than a second.

Not being aware of this delete action, I added a multi-table join index for testing purposes on top of the huge table. During the next batch load, the maintenance of this join index needed more than 20 hours. Finally, we decided to abort the transaction and cancel the rollback of the table -luckily inconsistency of this table was not an issue as it is deleted and repopulated on a daily base 😉

The query below helps to find all join indexes available in a Teradata System:

SELECT * FROM dbc.Indices WHERE indextype=’J’;

Is there anything you are missing in this article and you want us to write about it? Is there something wrong which we should correct?

Please leave us a comment!

See also:
The Secondary Index in Teradata – The NUSI compendium