teradata join index

 What is the Teradata Join Index (JI)?

Join indexes store pre-joined tables, the result of aggregations, or merely change a table to a different physical structure. No matter which of the mentioned activities lies at the heart of the Join Index, all of them are stored permanently on the disks.

Although the name indicates that join indexes are an index structure, they have to be considered more as an extra layer of pre-joined, pre-aggregated or permanent tables. Nevertheless, we can’t deny, that there are some similarities with other index structures.

In contrast to secondary indexes, which are stored as internal sub-tables, Join Indexes are saved as separate physical database objects. Therefore, a Join Index may be retained in whatever database you like.

As join indexes are stored similar to base tables, their maintenance cost is available in the DBC.DBQLOGTBL table (Teradata stores no information about maintenance cost of any other index type).

Join indexes cannot be peeped through. Also, their usage or circumvention is not subject to user decision. Only the Teradata Optimizer decides when it makes sense, – from a cost point of view –  to use the join index, and when it is cheaper to use the underlying table structures.

While real indexes are mostly limited to the tasks of being an additional access path to the data, join indexes can be designed in various ways and for different reasons.

When should we use a Teradata Join Index?

 

  • Join Indexes are Ideal for many joins of n middle to large tables with a significant number of rows from both tables.
  • Useful for many joins of large tables and a relatively small set of columns.
  • If you often run queries with complex expression in its predicate.
  • Join indexes help in de-normalization.
  • They are ideally suited for alternative partitioning.
  • Join indexes are very useful when alternate keys are used for joining.
  • Move vast and time-consuming joins or aggregations into the batch windows.
  • Join indexes even allow direct access via the primary index of the JI.

As join indexes are similar to real tables, you can add a secondary index, apply partitioning, etc., which aids in the further performance.

 

The Teradata Join Index and Partitioning

  • The Join Index can be partitioned
  • A Partitioned JI for a non-partitioned base table is ok, but non-partitioned JI on partitioned base table not recommended from a performance point of view
  • You cannot specify a partition for a row compressed Join Index

 

Full or partial Coverage of the Join Index

When the join index is not covering, it can lookup the base table row by the ROWID. A covering join index has all required columns and can directly satisfy the query without accessing the base table.

Important:

There is no full coverage for SELECT * FROM table_name queries.

Avoid the practice of writing SQL queries without specifying a column list or join index coverage is not possible!

The usage of a join index with partial coverage works in the following situations:

  • One of the columns in the join index definition is the keyword ROWID. In this case, the base table ROWID is staying with each index row, and a lookup of the base table row can be done.
  • The column set of the UPI of the underlying table is part of the definition. In this case, the column combination can be hashed, and the ROWID of the base table row can be derived
  • The column set of the NUPI of the underlying table is part of the definition plus either
    (a) one of the columns in the definition of that index is the keyword ROWID or
    (b) the column set defining a USI on the underlying base table.

For (a) like in the case of the USI coverage, the base table rows can be accessed via the ROWID

For (b), similar to the UPI case described above, just that the base table rows can be obtained indirectly by hashing the USI columns,                       extracting the base table ROWIDs from the USI into the spool and retrieving the base table rows by scanning this spool.

Coverage does not guarantee use of a Join Index. The cost of using must be estimated to be less than the cost of not using!

Restrictions on SQL Operations

 

  • FULL OUTER JOIN not allowed
  • LEFT or RIGHT JOINS: on the inner side at least one non-nullable column
  • OUTER JOIN preferable for Join Index usage likelihood, but not allowed for Aggregate JI
  • HAVING and QUALIFY not allowed
  • No Set Operations: UNION, INTERSECT, MINUS
  • No Sub queries
  • No inequality conditions for ON clauses in join index definitions. Supported if ANDed to at least one equality join condition
  • Only <, <=, >=, > as comparison operators in join allowed
  • TOP n and SAMPLE not allowed

The Teradata Multi-Table Join Index (MTJI)

Multi-Table join indexes give us the possibility to move resource consuming joins of tables from the online to the batch window. Moving workload will not cut the total system load, but shift it to a better time.

The syntax for a Multi-Table Join Index is similar to the one when we create a table from an SQL select statement:

CREATE JOIN INDEX <MyJoinIndex> AS
SELECT t01.PK, t01.COL_A,t02.COL_B
FROM
<TABLE_A> t01
INNER JOIN
<TABLE_B> t02
ON
t01.PK = t02.PK
PRIMARY INDEX (COL_B);

Above example shows several important characteristics of a join index:

Of course, we could make the same result with a temporary table. But contrary to temporary tables, join indexes are hidden. While we could use a temporary table, the usage of the join index depends on the Optimizer and the availability of statistics.

What does this mean for us? Join Indexes will be used automatically by the system, the usage of a temporary table depends on us. If we can reach a point where Teradata reuses a join index on several occasions, it will improve performance. Otherwise, we probably should stay with temporary tables, offering full control of usage.

There is one significant disadvantage of Multi-Table join indexes: Often we create them to do pre-joining of tables (joins during query execution are expensive)

We may have a report that joins the same two tables together on a continuous base. If this report needs to process large amounts of data, it will cost both processing time and a lot of disk space.

If we want to overcome any possible deviation between indexed columns and the queried columns, it is feasible to store the pseudo-column ROWID in the join index which allows the usage of this index even if not all needed columns are available in the query.

Teradata will get the missing columns from the base table (as it has the ROWID available, this is the same as a primary index access).

All join indexes are maintained by Teradata when base table rows change, but base tables with join indexes can’t be loaded with any bulk load utility (MultiLoad, FastLoad).

This limitation is somehow conflicting to me: I expect to join indexes to be useful to avoid joins between big tables, but I can’t bulk load these tables?

Restrictions for outer-joined tables in the join index give some more headache. For example:

CREATE JOIN INDEX <MyJoinIndex> AS
SELECT t01.PK, t01.COL_A,t02.COL_B
FROM
<TABLE_A> t01
LEFT OUTER JOIN
<TABLE_B> t02
ON
t01.PK = t02.PK
PRIMARY INDEX (COL_B);

Each outer join  in a multi-table join index has to meet the following restrictions (the example does a LEFT OUTER JOIN, for a RIGHT OUTER JOIN, please switch left and right in the text below):

– All columns from the left table have to be selected
– At least one column from the right table requires the restriction “NOT NULL”.

Multi-Table Join Indexes can be compressed by putting brackets around a group of columns.  Compression reduces occupied disk space. Highly compressible column combinations are obviously good candidates. Such combinations are stored only once, but I could not find any details about the internal minutes of this approach.

CREATE JOIN INDEX <MyJoinIndex> AS
SELECT t01.PK,( t01.COL_A,t02.COL_B)
FROM
<TABLE_A> t01
LEFT OUTER JOIN
<TABLE_B> t02
ON
t01.PK = t02.PK
PRIMARY INDEX (COL_B);

The Teradata Single-Table Join Index (STJI)

The name is somehow misleading as no join is involved. Single-Table join indexes are created on single tables. Their main purpose is to have the same table available with a different primary index or to have a smaller table (the join index table) with fewer columns to cut IOs.

Let’s assume we have the following table below:

CREATE TABLE <MyTable> (PK INTEGER, COL_A INTEGER) PRIMARY INDEX (PK);

By creating the below join index, we now have physically stored the same table twice, with two different primary indexes:

CREATE JOIN INDEX <MyJoinIndex> AS
SELECT t01.PK, t01.COL_A
FROM
<MyTable> t01
PRIMARY INDEX (COL_A);

The join index on a different primary index gives the Teradata Optimizer an additional data access path.

The Teradata Aggregate Join Index (AJI)

Aggregate join indexes are pre-aggregations which are automatically maintained by Teradata. They can be based on single or multiple tables. Below is an example of a single-table aggregate join index:

CREATE JOIN INDEX <MyJoinIndex> AS
SELECT t01.PK, SUM(t01.COL_A)
FROM
<MyTable> t01
GROUP BY 1
PRIMARY INDEX (PK);

Aggregate join indexes are very limited in the aggregation functions which can be applied: SUM and COUNT and since Teradata Release 14.10 MAX and MIN are available as well.

Usability of the aggregate join index requires:

  • The grouping clause must include all columns in the grouping clause of the query.
  • The WHERE clause columns that join to tables, not in the aggregate join index must be part of the join index definition
  • For an aggregate join index with PPI, an aggregated column cannot be a partitioning column

 

The Teradata Optimizer will consider rewriting a query to make use of an existing AJI.
It will take into account an existing AJI in the following situations:

  • For the  coverage of queries containing a sub query (derived table spool)
  • For internally created distinct and grouping steps
  • To partly cover outer join queries
  • For partial/early grouping steps

 

The Teradata Sparse Join Index

Adding a WHERE condition to the join index that rules out some part of the table content turns a given index into a sparse join index. They allow us to cut the amount of data which has to be stored but requires queries to include the same WHERE condition. Otherwise, this join index will never be used.

Conclusion

From my experience, it ‘s hard to convince Teradata to use an individual join index, even with all relevant statistics collected.

Since they may be of great help in a slowly changing environment with simple queries and little need for change and invention of new reports, their potential to move ahead in processing time and resource consumption is limited in short-lived, dynamic reporting settings.

Furthermore, too many restrictions, for my taste. All this makes me prefer the real temporary tables in most cases, that also qualify for Join Indexes.

Be warned when experimenting with join indexes in a production environment. They may have unexpected adverse side effects. I will give you one real-life example:

Our daily batch load had a SQL step deleting a huge, 6+ billion records table:

DELETE FROM the_huge_table;

Not a big deal. The cylinder index is marking the related data blocks as free, and that’s it. No transaction log involved at all. The step normally finished in less than a second.

Not being aware of this delete action, I added a multi-table join index for testing purposes on top of the huge table. During the next batch load, the maintenance of this join index needed more than 20 hours. Finally, we decided to abort the transaction and cancel the rollback of the table -luckily inconsistency of this table was not an issue as it is deleted and repopulated on a daily base 😉

The query below helps to find all join indexes available in a Teradata System:

SELECT * FROM dbc.Indices WHERE indextype=’J’;

 

Is there anything you are missing in this article and you want us to write about it? Is there something wrong which we should correct?

Please leave us a comment!

 

See also:
The Secondary Index in Teradata – The NUSI compendium

Our Reader Score
[Total: 28    Average: 4.4/5]
The Teradata Join Index Guide – We leave no Questions unanswered! written by Roland Wenzlofsky on May 12, 2014 average rating 4.4/5 - 28 user ratings
SHARE
Previous articleThe Teradata Node Review
Next articleThe Teradata USI compendium
Roland Wenzlofsky

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

6 COMMENTS

  1. Thanks a lot for this amazing article.

    Can you please elaborate the following statements:

    Move vast and time-consuming joins or aggregations into the batch windows.
    Join indexes even allow direct access via the primary index of the JI.

  2. DELETE FROM the_huge_table;
    Instead of Deleting 6+ billion records every day , i think it is wise to drop the table and recreate the table.

  3. Two data blocks can be merged if the occupied space drops below a certain limit. I am not 100% sure but I think it is the MERGEBLOCKRATIO from the table DDL which can be used to change the default value for block merges.

  4. Hi Roland,

    Teradata reads and writes on disk in terms of blocks, that is the lowest measure.
    Assuming our data blocks have more than one rows each, consider that we have to delete just one row from a table.
    How does Teradata proceed. Does it mark the entire block as free? What about remaining rows of the block.
    Does it create another fresh block with only remaining rows and then marks the old block as free?

    Regards,
    Virendra

  5. Hi,

    We have a scenario where a table has fields A and B, both Decimal (15,3). An AJI for this table has SUM(A) and SUM(B). When we use SEL SUM(A), SUM(B) FROM explain plan chooses AJI as access path which is ok because index is fully covering, but data is incorrect!
    We verified by running SEL SUM(A) as Z1, SUM(B) AS Z2, SUM(Z1 – Z2) as Z3. Now the optimizer accesses base table directly and we get correct values. What could be the reason?

LEAVE A REPLY

Please enter your comment!
Please enter your name here