Teradata Explain Statement: A Guide to Optimizing SQL Performance

Roland Wenzlofsky

April 21, 2023

minutes reading time


What is the Teradata Explain Statement?

Optimizing performance can be challenging, often requiring significant effort to identify the root issues. Thankfully, Teradata provides the EXPLAIN statement as a robust tool for assessing the execution of SQL statements. In Teradata, SQL statements are processed in a series of distinct tasks that correspond to specific components of the statement, such as retrieving rows, joining tables, or aggregating results. By utilizing our knowledge of how Teradata handles these tasks and scrutinizing the execution plan furnished by EXPLAIN, we can swiftly pinpoint weak points within a query and utilize this data to enhance performance.

The EXPLAIN statement can be intimidating for novices. At first, the result may seem disorderly and incomprehensible. Nonetheless, deciphering the steps is easier than it seems. We must evaluate each step’s effect on performance and whether Teradata has made ideal selections, such as choosing an inappropriate join strategy.

Our analysis will entail a thorough evaluation of every Execution Plan element, identification of crucial operations, and support in identifying potential issues.

The Execution Plan comprises retrieval, joining, aggregation, and other tasks. Nonetheless, these three tasks make up 95% of all steps; therefore, attention will be paid to them. In brief, the Execution Plan entails reading, joining, and aggregating, ultimately creating a final result set. Each step depends on the preceding one’s output.

Decoding the Plan Steps

Teradata’s blocks correspond to distinct operations, including fetching rows, joining tables, and conducting aggregations. In addition to the operation category, we collect information on the number of AMPs involved, including All-AMPs, Single-AMP, or Group-AMP.

Each block’s EXPLAIN output will show something akin to:

AMPs retrieve, and single AMP retrieve steps, AMPs join, and single AMP join steps, and an AMPs step for aggregation.

By identifying the various types of operations, we can analyze and obtain detailed information about each one.

Prior to each transaction, certain preparation steps are necessary, such as redistributing and sorting the rows. Sorting is essential to utilize the most efficient search algorithm, specifically the binary search.

Join Preparation

Redistribution is required as the AMPs hold the rows exclusively for joining. Since the AMPs are separate processes and lack access to other AMPs’ disks, the two tables’ rows to be joined must reside on the same AMP. To accomplish this, the easiest method is rehashing one or both tables in question.

The EXPLAIN result for row movement might show something like:

duplicated on all AMPs, redistributed by the hash code of (new hash column(s)), redistributed by rowkey, etc.

The EXPLAIN output’s sorting section will show something akin to this:

Sort according to hash code, order by row hash, partition by rowkey, etc.

Row retrieval strategy

A vital aspect of all operations is retrieving rows from disks. The Teradata Database offers various access paths to obtain data, such as full table scans, primary and secondary index access, and more.

The EXPLAIN output for row retrieval may contain the following information:

by way of an all-row scan, by way of a rowhash match scan, by way of the primary index, by way of the hash value, etc.

Join Type

The EXPLAIN output will provide information on the selected join strategy if the operation is a join.

using a product join, a single partition hash join, a merge join, a rowkey-based merge join, etc.

Confidence

Each block denotes the level of confidence in its corresponding rows. Although desirable, attaining a high level of confidence is not always possible.

The EXPLAIN output will display a similar format:

high confidence, low confidence, no confidence, and join index confidence

This article explains the essential elements of an EXPLAIN statement, delivering about 80% of the relevant data present in an explanation output.

  • Hi Roland,

    Nice article. Could you please also explain product join indicator ie. what it means and how it affects query performance. What causes high PJI for a query. Does only product join is cause high PJI or there are other factors that cause high PJI?

    Thank you

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >