February 24

0 comments

A Better Alternative For The Teradata NUSI

By DWH Pro Admin

February 24, 2020


The initial situation without any index

In this blog post, I will show you how you can optimize a query step by step using the right tools that Teradata offers you.

Let's start with the following test scenario:

CREATE TABLE Orders
(
	OrderId  BIGINT NOT NULL,
	CustomerId	BIGINT NOT NULL,
	Amount		BIGINT
) PRIMARY INDEX (OrderId);
INSERT INTO Orders 
SELECT     
  ROW_NUMBER() OVER (ORDER BY 1),     
  RANDOM(1,10000) AS CustomerId,     
  RANDOM(1,100)  
FROM SYS_CALENDAR.CALENDAR;  

The generated data is well distributed.
As I will show you later how selective the query has to be to use the tested indexes we define later, I set a large part of the rows to the same Primary Index value which I will use for the index:

UPDATE ORDERS SET CustomerId = 9999 WHERE CustomerId > 5000;
SELECT CustomerId,COUNT(*) FROM ORDERS GROUP BY 1
ORDER BY 2 DESC;
CustomerIdCount(*)
99997383
783430
500318

This will provide us with a test table with enough dummy data. Now we collect statistics on all columns so that the optimizer can create optimal estimates:

COLLECT STATS ON ORDERS COLUMN(OrderId); 
COLLECT STATS ON ORDERS COLUMN(CustomerId); 
COLLECT STATS ON ORDERS COLUMN(Amount);

Without Index Usage

This is the query we want to optimize. We let it run without an index for now:

SELECT * FROM Orders WHERE CustomerId = 3000;

This is the Execution Plan:

Explain 
SELECT * FROM Orders WHERE CustomerId = 3000;

  1) First, we lock DWHPRO.Orders in TD_MAP1 for read on a reserved
     RowHash to prevent global deadlock.
  2) Next, we lock DWHPRO.Orders in TD_MAP1 for read.
  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.Orders by
     way of an all-rows scan with a condition of (
     "DWHPRO.Orders.CustomerId = 3000") into Spool 1 (group_amps),
     which is built locally on the AMPs.  The size of Spool 1 is
     estimated with high confidence to be 8 rows (360 bytes).  The
     estimated time for this step is 0.02 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.02 seconds.

The optimizer has no choice but to make a full table scan because the WHERE condition does not correspond to the primary index of the table.

Teradata NUSI Usage

Since the selectivity of the query is high, a NUSI can be used here. So we will create a NUSI which corresponds to the WHERE condition column:

CREATE INDEX (CustomerId) ON Orders;

Once again we run our query:

SELECT * FROM Orders WHERE CustomerId = 3000;

This is the Execution Plan:

Explain SELECT * FROM Orders WHERE CustomerId = 3000;

  1) First, we lock DWHPRO.Orders in TD_MAP1 for read on a reserved
     RowHash to prevent global deadlock.
  2) Next, we lock DWHPRO.Orders in TD_MAP1 for read.
  3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.Orders by
     way of index # 4 "DWHPRO.Orders.CustomerId = 3000" with no
     residual conditions into Spool 1 (group_amps), which is built
     locally on the AMPs.  The size of Spool 1 is estimated with high
     confidence to be 8 rows (360 bytes).  The estimated time for this
     step is 0.01 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.01 seconds.

Since the NUSI is selective enough, the optimizer uses it. This is already an improvement over the previous scenario without index. But is there still better?

The “Hashed NUSI”

The use of the NUSI has already brought significant performance improvement.

However, NUSI has one disadvantage: it is an ALL-AMP operation since the NUSI index rows are always stored with the base table rows on the same AMP and are not distributed by hashing.

So how about if we make ourselves something like a hash distributed NUSI? How to do this I will show you in the next example:

CREATE JOIN INDEX Orders_JI
AS
SELECT CustomerId,ROWID FROM ORDERS PRIMARY INDEX (CustomerId);
COLLECT STATS ON Orders_JI COLUMN(CustomerId);

COLLECT STATS ON Orders_JI COLUMN(CustomerId);

This join index offers all the advantages of the NUSI but requires only one AMP to find the ROWIDs of the base table rows and can then access them like the NUSI.

We run our query again:

SELECT * FROM Orders WHERE CustomerId = 3000;

This is the Execution Plan:

Explain SELECT * FROM Orders WHERE CustomerId = 3000;

  1) First, we lock DWHPRO.Orders in TD_MAP1 for read on a reserved
     RowHash to prevent global deadlock.
  2) Next, we lock DWHPRO.Orders in TD_MAP1 for read.
  3) We do a single-AMP RETRIEVE step from DWHPRO.ORDERS_JI by way of
     the primary index "DWHPRO.ORDERS_JI.CustomerId = 3000" with no
     residual conditions into Spool 2 (all_amps), which is
     redistributed by the hash code of (DWHPRO.ORDERS_JI.Field_1026) to
     few or all AMPs in TD_Map1.  Then we do a SORT to order Spool 2 by
     the sort key in spool field1.  The size of Spool 2 is estimated
     with high confidence to be 8 rows (248 bytes).  The estimated time
     for this step is 0.00 seconds.
  4) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
     way of an all-rows scan, which is joined to DWHPRO.Orders by way
     of an all-rows scan with no residual conditions.  Spool 2 and
     DWHPRO.Orders are joined using a row id join, with a join
     condition of ("Field_1 = DWHPRO.Orders.ROWID").  The result goes
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with low confidence to be 8 rows
     (360 bytes).  The estimated time for this step is 0.01 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.01 seconds.

This join index is an index that does not need to be unique but only requires an AMP to find the index rows.

The Improved “Hashed NUSI”

A final improvement is still possible. Our Join Index stores the Index Rows as shown in the graphic below:

Teradata Join Index
Uncompressed Join Index

The NUSI on the other hand stores all ROWIDs belonging to one index value together.

But even that is no problem. It is easy to achieve this format with a Join Index if you use the following syntax (note the brackets).

The second pair of parentheses compresses the column combinations, in our case, this is just the ROWID pointing to the base table:

CREATE JOIN INDEX Orders_JI
AS
SELECT (CustomerId),(ROWID) FROM ORDERS PRIMARY INDEX (CustomerId);
COLLECT STATS ON Orders_JI COLUMN(CustomerId);

COLLECT STATS ON Orders_JI COLUMN(CustomerId);

This syntax causes the same ROWIDs as in NUSI to be stored in an index row (several only if one index row is not sufficient to store all ROWIDs):

Why is the compression of the Teradata join index so important?

We want to get the most out of it. We have started without an index, have arrived via the NUSI at a join index that performs better than the NUSI (since it only needs an AMP).

By storing in a compressed form not only space but also additional I/Os can be saved:

SELECT TABLENAME,SUM(CURRENTPERM) AS Perm 
FROM DBC.TABLESIZE 
WHERE DATABASENAME = 'DWHPRO' 
AND TABLENAME LIKE 'Orders_JI%'
GROUP BY 1;
TablenamePerm
Orders_JI 1 875 968
Orders_JI_Comp 1 114 112

The compressed index is only about 60% as large as the uncompressed index!

Final Words

I hope this example has shown you that Teradata offers a lot of possibilities to optimize queries.

But don't get this example wrong: Not always the Join Index is the better choice, but also not always the NUSI.

It requires a lot of experience and knowledge to create the right index landscape.

DWH Pro Admin


Teradata and Data Warehouse Specialist with more than 20 years of experience

You might also like

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Never miss a good story!

 Subscribe to our newsletter to keep up with the latest trends!

>