Understanding The Teradata Join Index: Benefits, Usage, And Implementation

Table of Contents

What is the Teradata Join Index?

Teradata Join indexes store pre-joined tables, aggregated results, or single table content in an alternate structure. Join indexes can be likened to materialized views.

Join indexes serve as more than just an index structure. They provide an extra layer consisting of pre-joined, pre-aggregated, or permanent tables that come equipped with their own Primary Index, Secondary Index, and row-level partitioning.

The join index is similar to a secondary index as it offers the Optimizer more ways to access data.

Teradata stores join indexes as distinct physical database entities, in contrast to secondary indexes that are kept as internal subtables.

The user may choose to save a join index in any database.

Because join indexes are stored like base tables, their maintenance costs are visible in table DBC.DBQLOGTBL (Teradata does not store information about the maintenance costs of other index types).

The Optimizer’s Use of a Join Index does not rely on the user’s discretion.

The Teradata Optimizer determines the cost-effectiveness of using the join index and the underlying table structures.

Secondary indexes provide one additional access path to the base table, typically consisting of a hash lookup or full index scan for a NUSI.

Join indexes may take on different models and form the foundation for partitioning or secondary indexes.

When to Use a Join Index?

Join together large tables with a significant number of rows
Join large tables selecting only a few columns
Queries with complex expressions in their predicate
Denormalization of PDM to increase performance
Create alternative row-level partitioning
Create an alternative Primary Index as an additional access path
Movement of time-consuming joins and aggregations into the batch window

A join index offers diverse data access paths and serves the purpose of optimizing performance.

Improved performance can be attained by crafting a join index with an alternative primary index or implementing row-level partitioning.

If querying is not feasible using the primary index or partitions of the join index, the Optimizer will employ a full index scan on the join index. This is viable if the index is considerably smaller than the base table, as fewer data blocks must be accessed.

The Primary Index of a Join Index

A unique Primary Index can be defined if the index is not compressed (unrelated to multivalue compression, but the compression of repeating groups in the index) and only on a single table join index.

Using a Unique Primary Index (UPI) on a base table offers numerous benefits, including preventing duplication when inserting or updating records.

The Primary Index typically facilitates data distribution, sorting rows by rowhash on the AMPs. However, the Join Index offers the option to select an alternative sort order through the “ORDER BY VALUE” clause. Sorting by value can enhance performance when executing range queries.

CREATE JOIN INDEX JI1 AS
    SELECT (a, b), (c, d, e)
    FROM TheTable
    LEFT JOIN Customer ON a= c
    ORDER BY a
    PRIMARY INDEX (b);

The value order is restricted to 4-byte numeric values. Historically, this has been the highest acceptable value for row hashes saved in the data block’s data pointer array. It is possible to use DATE as it is saved as INTEGER internally.

Join Indexes with Built-In Functions

If the functions DATE, CURRENT_DATE, or CURRENT_TIMESTAMP are used in the Expression partition of a join index, a one-time evaluation occurs when the index is created and not dynamically when the index is used. This also applies to using these functions in a WHERE condition of the join index.

However, “ALTER TABLE TO CURRENT” can be used on the index to update the value in the Expression partition to the current value.

Why you should use OUTER JOINS

Defining Join Indexes with OUTER JOIN is preferable, as non-matching rows are lost with INNER JOIN. This approach expands the scope of queries that can be accommodated.

Join Index and Partitioning

A Join Index can be Row or Column Partitioned
Partitioning adds access path
A Join Index can be partitioned on top of the non-partitioned table
A non-partitioned Join Index on top of a partitioned base table is not recommended for performance reasons
A Row compressed Join Index can’t be row-level partitioned
The Partitioning Expression of a join index can be numeric or of a character data type
DATE, CURRENT_DATE, or CURRENT_TIMESTAMP functions can be used in the partition expression of a join index

Join Index Coverage

If the join index does not contain all selected columns (“not covering”), it can access these missing columns using the ROWID of the base table. The dummy column “ROWID” usually must be included (exceptions see below) in the SELECT list.

The cost saving is that the relevant rowids are first extracted into a spool file using the join index. In a subsequent step, the base table rows belonging to the rowids are queried.

If the join index contains all SELECT list columns (“covering”), no access to the base table is required.

There is no coverage possible for SELECT * FROM table_name queries:

Avoid writing SQL queries without specifying column lists or join index coverage.

When will the Optimizer use a non-covering Join Index?

One of the join index definition columns is the dummy column “ROWID”. If this is the case, the base table ROWID stays with each index row, and a lookup of the base table row can be done as described above.
The UPI columns of the base table are available in the SELECT list of the Join Index. Teradata can hash this column combination to identify the AMP containing a row. Again, the rowids are used to access the base table. A message with the rowid is sent to the AMP to receive the required row.
The column set of a NUPI of the base table is contained in the select list of the join index (condition 1), and one of the two additional conditions is met:

(2a) The Join Index contains the dummy column “ROWID” or
(2b) A USI on the base table is available, which matches the column set of the NUPI.

If condition (2a) is met, the base table rows can be accessed via the column “ROWID” by rehashing the NUPI columns and accessing the base table rows via the rowids extracted into a spool file.

Condition (2b) is similar to the UPI case described above. The AMPs owning the requested rows are identified by rehashing the USI columns. Everything else is the same as in condition (2a)

Coverage does not guarantee the use of a Join Index. The cost of using it must be less than accessing the base table using another index or a full table scan.

What are the Limitations of a Join Index?

The Usage of FULL OUTER JOIN is not allowed.
LEFT or RIGHT JOINS: on the inner side, at least one non-nullable column must be selected
OUTER JOIN is preferable for Join Index usage likelihood but not allowed for Aggregate Join Index
HAVING and QUALIFY not allowed
No Set Operations are allowed (UNION, INTERSECT, MINUS)
No Subqueries are allowed
No inequality conditions in ON clauses of join index definitions are allowed. Exception: If they are ANDed to at least one equality join condition
Only <, <=, >=, > as comparison operators allowed
TOP n and SAMPLE not allowed
The join index is marked as invalid when restoring a base table or database.
A maximum of 64 columns per table per Join Index is allowed
A maximum of 32 Indexes can be defined on a table, including join indexes
UPI only allowed for single table join index
Value-ordered NUSI and Partition Expression on the same Join Index columns are prohibited.

Multi-Table Join Index

Multi-Table Join Indexes enable the transfer of resource-intensive joins from the online window to the batch window.

Distributing the workload enhances the system’s efficiency by transferring it to a more advantageous moment without diminishing it.

To create a join index, use a syntax akin to a CREATE TABLE AS statement.

CREATE JOIN INDEX  AS
SELECT t01.PK, t01.COL_A,t02.COL_B
FROM
 t01
INNER JOIN
 t02
ON
t01.PK = t02.PK
PRIMARY INDEX (COL_B);

The above statement highlights key features of a join index.

Join indexes offer a superior substitute to temporary tables. Nevertheless, their usage is not assured, unlike temporary tables. The Optimizer’s decision to utilize an existing join index is contingent on the availability of relevant statistics.

Create a join index only if the Optimizer can efficiently reuse it in various situations. The extra space required must be carefully considered. If the use of a join index cannot be consistently achieved, a temporary table may be a more suitable option.

To address the potential discrepancy between indexed and queried columns, storing the pseudo-column ROWID in the join index is possible. This enables the utilization of the index even when all necessary columns are not present in the query.

Teradata retrieves the absent columns from the base table utilizing the ROWID, which grants primary index access.

Limitations of the Multi-Table Join Index

The bulk load utilities only apply when no defined join index exists.

A multi-table join index is commonly utilized to pre-join tables and circumvent costly online joining processes, especially when generating ad-hoc reports. Joining tasks typically incurs significant expenses due to the need for redistributing vast amounts of data.

The Join Index offers a notable benefit: Teradata performs all maintenance tasks and implements optimized techniques to ensure that an Aggregate Join Index remains current without re-aggregating the entire index table for every minor alteration.

Fastload and Multiload are not feasible, creating a paradox.

Removing and recreating the join index, such as optimized aggregation, eliminates the aforementioned benefits.

Join index constraints pose a challenge for outer-joined tables.

CREATE JOIN INDEX  AS
SELECT t01.PK, t01.COL_A,t02.COL_B
FROM
 t01
LEFT OUTER JOIN
 t02
ON
t01.PK = t02.PK
PRIMARY INDEX (COL_B);

Every outer join within a multi-table join index must adhere to the following limitations:

All columns from the left table have to be selected
At least one column from the right table must be defined as NOT NULL.

Compressed Join Index

Using brackets to group columns can compress Multi-Table Join Indexes, reducing occupied disk space. Compression ensures that each group of compressed values is stored only once instead of being repeated in each row.

Common column value pairings are ideal candidates.

CREATE JOIN INDEX  AS
SELECT t01.PK,(t01.COL_A,t02.COL_B)
FROM
 t01
LEFT OUTER JOIN
 t02
ON
t01.PK = t02.PK
PRIMARY INDEX (COL_B);

Single Table Join Index

The name is misleading, as no actual joining occurs. Single-Table join indexes are formed solely from a single base table.

Their main objective is to make available a table with a varied primary index, partitioning, or a smaller table (known as the join index table) that requires spooling fewer columns.

Assuming we have the table below:

CREATE TABLE  (PK INTEGER, COL_A INTEGER) PRIMARY INDEX (PK);

The creation of the join index below has resulted in the physical storage of the same table twice, albeit with different primary indexes.

CREATE JOIN INDEX  AS
SELECT t01.PK, t01.COL_A
FROM
 t01
PRIMARY INDEX (COL_A);

A distinct primary index on the join index provides the Teradata Optimizer with an extra route for data retrieval.

Aggregate Join Index

Teradata automatically maintains pre-aggregated Aggregate Join Indexes in an optimized manner, only re-aggregating necessary rows. These indexes can be based on single or multiple tables.

The aggregate join index enables the definition of a summary table without breaching the normalization of the data model.

Here is an instance of a join index that aggregates a single table:

CREATE JOIN INDEX  AS
SELECT t01.PK, SUM(t01.COL_A)
FROM
 t01
GROUP BY 1
PRIMARY INDEX (PK);

Aggregate join indexes support a limited range of aggregation functions, including SUM, COUNT, MIN, and MAX.

An aggregate join index is utilized only when the following conditions are met:

The grouping clause must include all columns in the grouping clause of the query.
WHERE clause columns belonging to tables not defined in the aggregate join index must be part of the join index select list.
For an aggregate join index with a PPI, an aggregated column cannot be a partitioning column.

The Teradata Optimizer may rewrite a query using an existing Aggregate Join Index.

The following scenarios will explore the utilization of an existing Aggregate Join Index.

For the coverage of queries containing a subquery (derived table spool)
For internally created distinct and grouping steps
To partly cover outer join queries
For partial/early grouping steps

Sparse Join Index

A WHERE condition applied to a join index transforms it into a sparse join index.

The join index exclusively stores rows that meet the WHERE condition, resulting in space conservation. Nonetheless, a sparse index is only practical when queries incorporate the WHERE condition; otherwise, the join index remains unusable.

A concise join index requires less permanent storage as it only stores the rows that match the WHERE conditions.

This provides an additional advantage as the index maintenance is optimized.

Before inserting a row, it undergoes a check to ensure it satisfies the WHERE condition. If it meets the WHERE condition criteria, it is eligible for insertion. Additionally, Teradata verifies if the row satisfies the partition expression.

Join Index Summary

To utilize a join index, the Optimizer requires sufficient preparation and knowledge.

Utilizing a join index is practical for consistent queries. However, discovering a beneficial set of join indexes that the Optimizer also utilizes can be quite daunting in systems with numerous impromptu queries.

Utilizing join indexes in a production environment may result in unforeseen consequences. Here’s an actual illustration:

A daily batch load had a SQL step deleting a huge 6+ billion records table:

DELETE FROM the_huge_table;

I overlooked this DELETE step in the load process and added a multi-table join index to the table.

The result was that the subsequent batch load took 20 hours to maintain the join index triggered by the DELETE.

I had to cancel the DELETE, and so did the subsequent ROLLBACK. This is always critical because the table is then in an inconsistent state. But that didn’t matter in this case because the table should be deleted entirely.

The following query can be used to retrieve all join indexes present in a Teradata system:

SELECT * FROM dbc. Indices WHERE indextype='J';

Refer to:
The NUSI Compendium: A Guide to the Secondary Index in Teradata

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

8 thoughts on “Understanding the Teradata Join Index: Benefits, Usage, and Implementation”

Siyanda B

02/04/2019 at 11:58 am

Failed [5464 : HY000] Error in Join Index DDL, Only satisfiable single-table conditions and equality join conditions are allowed in the ON clause. What causes this error, I am struggling to resolve the because I do not understand the problem. When I execute the query (without CREATE JOIN INDEX statement), the query executes with no issues but when I encapsulate it in a JI it refuses.
Paramananda

09/08/2018 at 12:26 pm

if we have a single table join index(st_jn_idx) on dept_no for emp table.
How to join the dept table and the join single table join index on emp table.

Can we write,
select emp.*
from dept as d join st_jn_idx as e
on d.dept_no=e.dept_no;

Kindly suggest.

Thanks