The Teradata Explain Statement
What is the Teradata Explain Statement?
If you are new to Teradata, performance optimization of a single SQL statement can be quite challenging, and you may not know where to begin
Luckily, we have a very useful tool which gives us the valuable information which is needed for performing this task: The Teradata EXPLAIN statement.
By combining the usage of the explain statement with the knowledge about how SQL statements are executed technically on a Teradata system, you will be able to understand the quality of each query. This information allows you to track down performance issues on short notice (however, solving them is another challenge).
Although you might feel overwhelmed when interacting for the first time with the output of the explain statement, all you need to do is to deconstruct the parts which are making up the execution plan.
But first, let’s see how Teradata strips down a SQL statement into some atomic operations, which are repeating over and over during query execution and are contributing to the finally a result set.
The principle is straightforward:
Take two tables (or spools, which are temporary tables used by Teradata during query execution) and join them together, creating a target spool. Proceed to join sets of two spools until the final result set is built. It’s as simple as that. Intermediate spools are input to the next step in the execution plan until the overall query execution is finished.
Some steps are only related to data retrieval involving just one table.
Although the above-described process is a very clear presentation, it will serve you well to be able to do performance optimization on query level.
Now let’s proceed to the next step, i.e. reading and understanding the plain text output delivered by the EXPLAIN statement. If you look at it for the first time, you may not recognize any structure in it. We will deconstruct the output into its pieces to carve out the relevant information behind.
The output of an explain shows the exact sequence of two-table joins and described our first level of breakdown. Prepared with this knowledge about blocks of two-table joins we can dig deeper into one of this blocks and analyze what’s happening in detail.
Each of this blocks represents an individual operation on Teradata: these are mainly retrieving of rows, joining the rows of the two tables together, doing an aggregation. Together with the type of operation, we get information about how many AMPs take part, which can be All-AMPs, Single-AMP, Group-AMP.
You will see in the explain output for each block something like:
All-AMPs retrieve step, Single-AMP retrieves step, All-AMP join step, Single-AMP join step, All-AMPs step to aggregate.
Now that we know which kind of operations exists, we can start to deconstruct each of them to get further important details.
Some preparation steps may precede each transaction. These are mainly the redistribution and sorting of rows. Sorting is just a required step to be able to apply the fastest available search algorithm, namely the binary search.
Redistribution is needed as join steps are done by the AMPs holding the rows to be joined. As AMPs are independent processes, with no access possibility to the disks of other AMPs, the rows to be joined from two tables have to be on the same AMP. The easiest way to meet this: rehash one or both tables taking part in the join.
You will see something like the following in the explain output for row movement:
duplicated on all AMPs, redistributed by the hash code of (new hash column(s)), redistributed by rowkey etc.
You will see something like the following in the explain output for sorting:
sort to order by hash code, sort to order by row hash, sort to partition by rowkey etc.
Row retrieval strategy
One important information all operations carry is the way of retrieving rows from the disks. There are several access paths to get out the data from the Teradata Database like full table scans, primary index access, secondary index access, etc.
You will see something like the following in the explain output for row retrieval:
by way of an all-row scan, by way of rowhash match scan, by way of the primary index, by the way of hash value etc.
Finally, if the operation is a join operation, the explain output will tell you exactly which kind of join strategy was chosen:
using a product join, using a single partition hash join, using a merge join, using a rowkey based merge join etc.
Each block provides information about the level of confidence about the number of resulting rows it has, too. High confidence would be ideal but is not always possible.
In the explain output you will see something like:
high confidence, low confidence, no confidence and join index confidence
Summary: This article decoded the build blocks of an explain statement. It is by far not complete but covers probably 80% of the valuable information available in any explain output.