What is the Teradata Explain Statement?
If you are new to Teradata, performance optimization of a single SQL statement can be quite challenging, and you may not know where to begin.
Luckily, we have a handy tool that gives us valuable information for performing this task: The Teradata EXPLAIN statement.
By combining the use of the explain statement with the knowledge about how SQL statements are executed technically on a Teradata system, you will understand the quality of each query. This information allows you to track down performance issues on short notice (however, solving them is another challenge).
Although you might feel overwhelmed when interacting for the first time with the explain statement’s output, all you need to do is to deconstruct the parts that are making up the execution plan.
But first, let’s see how Teradata strips down a SQL statement into some atomic operations, which are repeating over and over during query execution and are contributing to the final a result set.
The principle is straightforward:
Take two tables (or spools, which are temporary tables used by Teradata during query execution) and join them together, creating a target spool. Proceed to join sets of two spools until the final result set is built. It’s as simple as that. Intermediate spools are input to the next step in the execution plan until the overall query execution is finished.
Some steps are only related to data retrieval involving just one table.
Although the above-described process is an obvious presentation, it will serve you well to perform performance optimization on the query level.
Now let’s proceed to the next step, i.e., reading and understanding the EXPLAIN statement’s plain text output. If you look at it for the first time, you may not recognize any structure in it. We will deconstruct the output into its pieces to carve out the relevant information behind it.
The output of an explain shows the exact sequence of two-table joins and described our first breakdown level. Prepared with this knowledge about blocks of two-table joins, we can dig deeper into one of these blocks and analyze what’s happening in detail.
Each of these blocks represents an individual operation on Teradata: these are mainly retrieving of rows, joining the rows of the two tables together, doing an aggregation. Together with the type of operation, we get information about how many AMPs take part, which can be All-AMPs, Single-AMP, Group-AMP.
You will see in the explain output for each block something like:
All-AMPs retrieve step, Single-AMP retrieves step, All-AMP join step, Single-AMP join step, All-AMPs step to aggregate.
Now that we know which kind of operations exists, we can start to deconstruct each of them to get further essential details.
Some preparation steps may precede each transaction. These are mainly the redistribution and sorting of rows. Sorting is just a required step to be able to apply the fastest available search algorithm, namely the binary search.
Redistribution is needed as join steps are done by the AMPs holding the rows to be joined. As AMPs are independent processes, with no access possibility to the disks of other AMPs, the rows to be joined from two tables have to be on the same AMP—the easiest way to meet this: rehash one or both tables taking part in the join.
You will see something like the following in the explain output for row movement:
duplicated on all AMPs, redistributed by the hash code of (new hash column(s)), redistributed by rowkey, etc.
You will see something like the following in the explain output for sorting:
sort to order by hash code, sort to order by row hash, sort to partition by rowkey, etc.
Row retrieval strategy
One crucial information all operations carry is the way of retrieving rows from the disks. There are several access paths to get out the data from the Teradata Database like full table scans, primary index access, secondary index access, etc.
You will see something like the following in the explain output for row retrieval:
by way of an all-row scan, by way of rowhash match scan, by way of the primary index, by way of hash value, etc.
Finally, if the operation is a join operation, the explain output will tell you exactly which kind of join strategy was chosen:
using a product join, using a single partition hash join, using a merge join, using a rowkey based merge join, etc.
Each block provides information about the level of confidence about the number of resulting rows it has, too. High confidence would be ideal, but it is not always possible.
In the explain output, you will see something like:
high confidence, low confidence, no confidence, and join index confidence
Summary: This article decoded the build blocks of an explain the statement. It is by far not complete but covers probably 80% of the valuable information available in any explain output.