The initial situation without any index
In this blog post, I will show you how you can optimize a query step by step using the right tools that Teradata offers you.
Let’s start with the following test scenario:
CREATE TABLE Orders ( OrderId BIGINT NOT NULL, CustomerId BIGINT NOT NULL, Amount BIGINT ) PRIMARY INDEX (OrderId);
INSERT INTO Orders SELECT ROW_NUMBER() OVER (ORDER BY 1), RANDOM(1,10000) AS CustomerId, RANDOM(1,100) FROM SYS_CALENDAR.CALENDAR;
The generated data is well distributed.
As I will show you later how selective the query has to be to use the tested indexes we define later, I set a large part of the rows to the same Primary Index value, which I will use for the index:
UPDATE ORDERS SET CustomerId = 9999 WHERE CustomerId > 5000;
SELECT CustomerId,COUNT(*) FROM ORDERS GROUP BY 1
ORDER BY 2 DESC;
CustomerId | Count(*) |
9999 | 7383 |
7834 | 30 |
5003 | 18 |
… | … |
This will provide us with a test table with enough dummy data. Now we collect statistics on all columns so that the optimizer can create optimal estimates:
COLLECT STATS ON ORDERS COLUMN(OrderId); COLLECT STATS ON ORDERS COLUMN(CustomerId); COLLECT STATS ON ORDERS COLUMN(Amount);
Without Index Usage
This is the query we want to optimize. We let it run without an index for now:
SELECT * FROM Orders WHERE CustomerId = 3000;
This is the Execution Plan:
Explain SELECT * FROM Orders WHERE CustomerId = 3000; 1) First, we lock DWHPRO.Orders in TD_MAP1 for read on a reserved RowHash to prevent global deadlock. 2) Next, we lock DWHPRO.Orders in TD_MAP1 for read. 3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.Orders by way of an all-rows scan with a condition of ( "DWHPRO.Orders.CustomerId = 3000") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 8 rows (360 bytes). The estimated time for this step is 0.02 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.02 seconds.
The optimizer has no choice but to make a full table scan because the WHERE condition does not correspond to the table’s primary index.
Teradata NUSI Usage
Since the query’s selectivity is high, a NUSI can be used here. So we will create a NUSI which corresponds to the WHERE condition column:
CREATE INDEX (CustomerId) ON Orders;
Once again, we run our query:
SELECT * FROM Orders WHERE CustomerId = 3000;
This is the Execution Plan:
Explain SELECT * FROM Orders WHERE CustomerId = 3000; 1) First, we lock DWHPRO.Orders in TD_MAP1 for read on a reserved RowHash to prevent global deadlock. 2) Next, we lock DWHPRO.Orders in TD_MAP1 for read. 3) We do an all-AMPs RETRIEVE step in TD_MAP1 from DWHPRO.Orders by way of index # 4 "DWHPRO.Orders.CustomerId = 3000" with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 8 rows (360 bytes). The estimated time for this step is 0.01 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.01 seconds.
Since the NUSI is selective enough, the optimizer uses it. This is already an improvement over the previous scenario without an index. But is there still better?
The “Hashed NUSI”
The use of the NUSI has already brought significant performance improvement.
However, NUSI has one disadvantage: it is an ALL-AMP operation since the NUSI index rows are always stored with the base table rows on the same AMP and are not distributed by hashing.
So how about if we make ourselves something like a hash distributed NUSI? How to do this I will show you in the next example:
CREATE JOIN INDEX Orders_JI
AS
SELECT CustomerId,ROWID FROM ORDERS PRIMARY INDEX (CustomerId);
COLLECT STATS ON Orders_JI COLUMN(CustomerId);
COLLECT STATS ON Orders_JI COLUMN(CustomerId);
This join index offers all the advantages of the NUSI but requires only one AMP to find the ROWIDs of the base table rows and can then access them like the NUSI.
We rerun our query:
SELECT * FROM Orders WHERE CustomerId = 3000;
This is the Execution Plan:
Explain SELECT * FROM Orders WHERE CustomerId = 3000; 1) First, we lock DWHPRO.Orders in TD_MAP1 for read on a reserved RowHash to prevent global deadlock. 2) Next, we lock DWHPRO.Orders in TD_MAP1 for read. 3) We do a single-AMP RETRIEVE step from DWHPRO.ORDERS_JI by way of the primary index "DWHPRO.ORDERS_JI.CustomerId = 3000" with no residual conditions into Spool 2 (all_amps), which is redistributed by the hash code of (DWHPRO.ORDERS_JI.Field_1026) to few or all AMPs in TD_Map1. Then we do a SORT to order Spool 2 by the sort key in spool field1. The size of Spool 2 is estimated with high confidence to be 8 rows (248 bytes). The estimated time for this step is 0.00 seconds. 4) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by way of an all-rows scan, which is joined to DWHPRO.Orders by way of an all-rows scan with no residual conditions. Spool 2 and DWHPRO.Orders are joined using a row id join, with a join condition of ("Field_1 = DWHPRO.Orders.ROWID"). The result goes into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows (360 bytes). The estimated time for this step is 0.01 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.01 seconds.
This join index is an index that does not need to be unique but only requires an AMP to find the index rows.
The Improved “Hashed NUSI”
A final improvement is still possible. Our Join Index stores the Index Rows, as shown in the graphic below:

The NUSI, on the other hand, stores all ROWIDs belonging to one index value together.
But even that is no problem. It is easy to achieve this format with a Join Index if you use the following syntax (note the brackets).
The second pair of parentheses compresses the column combinations. In our case, this is just the ROWID pointing to the base table:
CREATE JOIN INDEX Orders_JI AS SELECT (CustomerId),(ROWID) FROM ORDERS PRIMARY INDEX (CustomerId); COLLECT STATS ON Orders_JI COLUMN(CustomerId); COLLECT STATS ON Orders_JI COLUMN(CustomerId);
This syntax causes the same ROWIDs as in NUSI to be stored in an index row (several only if one index row is not sufficient to store all ROWIDs):

Why is the compression of the Teradata join index so important?
We want to get the most out of it. We have started without an index, and have arrived via the NUSI at a join index that performs better than the NUSI (since it only needs an AMP).
By storing in a compressed form, not only space but also additional I/Os can be saved:
SELECT TABLENAME,SUM(CURRENTPERM) AS Perm FROM DBC.TABLESIZE WHERE DATABASENAME = 'DWHPRO' AND TABLENAME LIKE 'Orders_JI%' GROUP BY 1;
Tablename | Perm |
Orders_JI | 1 875 968 |
Orders_JI_Comp | 1 114 112 |
The compressed index is only about 60% as large as the uncompressed index!
Final Words
I hope this example has shown you that Teradata offers a lot of possibilities to optimize queries.
But don’t get this example wrong: Not always the Join Index is the better choice, but also not always the NUSI.
Creating the right index landscape requires a lot of experience and knowledge.