What is the Teradata Explain Statement?
Performance tuning is often not an easy task. Finding the root cause of a performance problem can be costly. Fortunately, Teradata provides the EXPLAIN statement, a tool that gives us valuable information about how our SQL statement is executed. As we know, each SQL in Teradata is processed as a sequence of individual tasks. Most of these tasks correspond 1:1 to the individual components of our SQL statement. This includes retrieving rows, joining tables, aggregating results, and so on. Suppose we combine our knowledge of how Teradata executes these tasks internally with the execution plan provided by EXPLAIN. In that case, it is relatively easy to identify the weak points of a query and use them as a basis for our tuning activity.
It can be overwhelming if we are dealing with the EXPLAIN statement for the first time. At first sight, the output of the EXPLAIN statement looks like a chaotic text with which you can’t do much if you lack the experience to decode it accordingly. But it is easier than it seems at first. We only have to decode the individual steps and consider for each step how it affects the performance and whether Teradata has possibly not chosen the best option (e.g., an inappropriate join strategy).
We will go through each Execution Plan’s building blocks, work out the atomic operations and give you tips on what to look out for.
The Execution Plan is a sequence of retrieve steps, joins, aggregations, and a few other tasks. But the three mentioned tasks already cover 95% of all steps, so we will focus our attention on them. In summary, an Execution PLan is a sequence of reading steps, joins, and aggregations until the final result set is generated. The intermediate results of a step are the input for the next step.
Decoding the Plan Steps
Each of these blocks represents an individual operation on Teradata: these are mainly retrieving rows, joining the rows of the two tables together, and doing aggregations. Together with the type of operation, we get information about how many AMPs take part, which can be All-AMPs, Single-AMP, or Group-AMP.
You will see in the EXPLAIN output for each block something like:
All-AMPs retrieve step, Single-AMP retrieves step, All-AMP join step, Single-AMP join step, All-AMPs step to aggregate.
Now that we know which kind of operations exists, we can deconstruct each of them to get further essential details.
Some preparation steps may precede each transaction. These are mainly the redistribution and sorting of rows. Sorting is just a required step to be able to apply the fastest available search algorithm, namely the binary search.
Join Preparation
Redistribution is needed as join steps are done by the AMPs holding the rows to be joined. As AMPs are independent processes, with no access possibility to the disks of other AMPs, the rows to be joined from two tables have to be on the same AMP—the easiest way to meet this: rehash one or both tables taking part in the join.
You will see something like the following in the explain output for row movement:
duplicated on all AMPs, redistributed by the hash code of (new hash column(s)), redistributed by rowkey, etc.
You will see something like the following in the explain output for sorting:
sort to order by hash code, sort to order by row hash, some kind to partition by rowkey, etc.
Row retrieval strategy
One crucial piece of information all operations carry is the way of retrieving rows from the disks. There are several access paths to get the data from the Teradata Database like full table scans, primary index access, secondary index access, etc.
You will see something like the following in the explain output for row retrieval:
by way of an all-row scan, by way of a rowhash match scan, by way of the primary index, by way of the hash value, etc.
Join Type
Finally, if the operation is a join operation, the explain output will tell you exactly which kind of join strategy was chosen:
using a product join, a single partition hash join, a merge join, a rowkey-based merge join, etc.
Confidence
Each block provides information about the level of confidence about the number of resulting rows it has, too. High confidence would be ideal, but it is not always possible.
In the explain output, you will see something like:
high confidence, low confidence, no confidence, and join index confidence
Summary: This article decoded the build blocks of an explain statement. It is not complete but covers probably 80% of the valuable information available in any explain output.
Hi Roland,
Nice article. Could you please also explain product join indicator ie. what it means and how it affects query performance. What causes high PJI for a query. Does only product join is cause high PJI or there are other factors that cause high PJI?
Thank you