Joins as a Main Target for Teradata Tuning

1
988
Teradata UNION ALL

Joining tables is one of the most expensive operations of an SQL statement. This is because each join requires the rows to be joined on a common AMP.

If the two tables do not have the same primary index, it is necessary to copy the rows of one or both tables. This can cause a huge number of IOs in large tables.

For each join, the Teradata Optimizer can choose between different types of joins, each suitable for a specific initial scenario. The main goal of the Teradata Optimizer is to minimize resource consumption (IOs, CPU consumption, etc.) for the entire execution plan.

General Join Optimization Considerations

The optimizer makes use of available statistics in order to determine the optimal join plan.

The optimizer‘s cost estimations for each join are based on the following factors: The expected cardinality of the output spool (after the join) and the expected cardinality of the input spools after WHERE conditions have been applied.

The selected column lists of each input spool and the average row size. Usually, only the selected columns of each input spool are taken over into the resulting spool.

The only exception is if a permanent table is joined directly without spooling.

In such a case all columns of this table are taken over into the resulting spool.

Based on the above observations we can draw the following conclusions regarding performance:

The important statistics for each join are the ones on the join columns and on any available WHERE condition.
Whenever possible we should apply WHERE conditions in our queries in order to reduce the cardinality of the input tables.


further, we should only select the columns required in the result set and avoid „SELECT * FROM“ queries.

Our Reader Score
[Total: 11    Average: 3.4/5]
Joins as a Main Target for Teradata Tuning written by DWH Pro Admin on April 23, 2019 average rating 3.4/5 - 11 user ratings

1 COMMENT

  1. Hi Roland,
    Useful article to show the basics of getting good joins.
    There is one bit that I don’t understand, could you perhaps elaborate further on this? You say:

    “Usually, only the selected columns of each input spool are taken over into the resulting spool.
    The only exception is if a permanent table is joined directly without spooling.
    In such a case all columns of this table are taken over into the resulting spool. ”

    My understanding of what you’ve said is that if I join two tables on their PI columns – so we get an AMP local, PI:PI merge join – then all columns of both tables (only one table?) will get copied into the output spool file. Is that what you mean?

    If I provide a scenario involving two tables (T1 and T2:
    CREATE TABLE t1
    (col1 INTEGER NOT NULL
    ,col2 INTEGER
    ,col3 CHAR(15)
    ,col4 DATE
    )
    PRIMARY INDEX (col1);

    CREATE TABLE t2
    (col1 INTEGER NOT NULL
    ,col2 INTEGER
    ,col3 CHAR(15)
    ,col4 DATE
    )
    PRIMARY INDEX (col1);

    SELECT t1.col1
    ,t1.col3
    ,t1.col4
    ,t2.col2
    FROM t1 INNER JOIN t2
    ON t1.col1 = t2.col1;

    I’m only selecting 3 out of 4 columns from table T1, are you saying that all 4 columns from then table will go into the output spool file?

    I’ve probably mis-understood what you mean.

    Cheers,
    Dave

LEAVE A REPLY

Please enter your comment!
Please enter your name here