The Teradata Explain Statement

Roland Wenzlofsky

March 12, 2014

minutes reading time

What is the Teradata Explain Statement?

If you are new to Teradata, performance optimization of a single SQL statement can be pretty challenging, and you may not know where to begin.

Luckily, we have a handy tool that gives us valuable information for performing this task: The Teradata EXPLAIN statement.

By combining the explain statement with the knowledge about how SQL statements are executed technically on a Teradata system, you will understand the quality of each query. This information allows you to track down performance issues on short notice (however, solving them is another challenge).

Although you might feel overwhelmed when interacting for the first time with the explain statement’s output, all you need to do is to deconstruct the parts that are making up the execution plan.

But first, let’s see how Teradata strips down a SQL statement into some atomic operations, repeating over and over during query execution and contributing to the final result set.

The principle is straightforward:

Take two tables (or spools, temporary tables used by Teradata during query execution) and join them together, creating a target spool. Proceed to join sets of two spools until the final result set is built. It’s as simple as that. Intermediate spools are input to the next step in the execution plan until the overall query execution is finished.

Some steps are only related to data retrieval involving just one table.

Although the above-described process is a clear presentation, it will serve you well in performing performance optimization on the query level.

Now let’s proceed to the next step, i.e., reading and understanding the EXPLAIN statement’s plain text output. If you look at it for the first time, you may not recognize any structure. We will deconstruct the output into its pieces to carve out the relevant information behind it.

An explanation’s output shows the sequence of two-table joins and describes our first breakdown level. Prepared with this knowledge about blocks of two-table joins, we can dig deeper into one of these blocks and analyze what’s happening in detail.


Each of these blocks represents an individual operation on Teradata: these are mainly retrieving rows, joining the rows of the two tables together, and doing an aggregation. Together with the type of operation, we get information about how many AMPs take part,  which can be All-AMPs, Single-AMP, or Group-AMP.

You will see in the explain output for each block something like:

All-AMPs retrieve step, Single-AMP retrieves step, All-AMP join step, Single-AMP join step, All-AMPs step to aggregate.

Now that we know which kind of operations exists, we can deconstruct each of them to get further essential details.

Some preparation steps may precede each transaction. These are mainly the redistribution and sorting of rows. Sorting is just a required step to be able to apply the fastest available search algorithm, namely the binary search.

Join Preparation

Redistribution is needed as join steps are done by the AMPs holding the rows to be joined. As AMPs are independent processes, with no access possibility to the disks of other AMPs, the rows to be joined from two tables have to be on the same AMP—the easiest way to meet this: rehash one or both tables taking part in the join.

You will see something like the following in the explain output for row movement:

duplicated on all AMPs, redistributed by the hash code of (new hash column(s)), redistributed by rowkey, etc.

You will see something like the following in the explain output for sorting:

sort to order by hash code, sort to order by row hash, some kind to partition by rowkey, etc.

Row retrieval strategy

One crucial piece of information all operations carry is the way of retrieving rows from the disks. There are several access paths to get the data from the Teradata Database like full table scans, primary index access, secondary index access, etc.

You will see something like the following in the explain output for row retrieval:

by way of an all-row scan, by way of a rowhash match scan, by way of the primary index, by way of the hash value, etc.

Join Type

Finally, if the operation is a join operation, the explain output will tell you exactly which kind of join strategy was chosen:

using a product join, a single partition hash join, a merge join, a rowkey-based merge join, etc.


Each block provides information about the level of confidence about the number of resulting rows it has, too. High confidence would be ideal, but it is not always possible.

In the explain output, you will see something like:

high confidence, low confidence, no confidence, and join index confidence

Summary:  This article decoded the build blocks of an explain statement. It is not complete but covers probably 80% of the valuable information available in any explain output.

  • Hi Roland,

    Nice article. Could you please also explain product join indicator ie. what it means and how it affects query performance. What causes high PJI for a query. Does only product join is cause high PJI or there are other factors that cause high PJI?

    Thank you

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like