Optimizing Performance In Teradata With DIAGNOSTIC HELPSTATS ON FOR SESSION Command

Table of Contents

What is the DIAGNOSTIC HELPSTATS ON FOR SESSION Command in Teradata?

Gathering statistics is a crucial aspect of optimizing performance.

Collecting statistics requires resources, and it matters whether they are used only once in the ETL process or cover multiple queries’ workloads simultaneously. Fortunately, Teradata’s recent versions can be configured so that the “COLLECT STATISTIC” statement runs only when the data demographics change significantly. This conditional collection enables us to add statistics to specific script locations that were previously impossible or impractical. However, it fails to address all issues arising from decentralized statistics collection. Collecting statistics always involves a tradeoff between collection costs and the cost of a suboptimal execution plan for queries. Key factors to consider are:

For which column(s) should statistics be collected?
What kind of statistics should be collected (full statistics, sample statistics, or random AMP sampling)?
How often should statistics be collected?

Prioritizing the gathering of statistics is crucial for successful performance tuning. This preliminary task enables the Optimizer to formulate an optimal execution plan. Occasionally, this may be the sole necessary tuning action. Teradata offers a command that displays the required statistics for the best execution plan.

What Is DIAGNOSTIC HELPSTATS ON FOR SESSION?

Enabling “DIAGNOSTIC HELPSTATS ON FOR SESSION” provides a practical option. The Optimizer displays the necessary statistics for creating the most effective execution plan at the conclusion of the Explain statement’s output. Here is an example:

DIAGNOSTIC HELPSTATS ON FOR SESSION;
EXPLAIN SELECT * FROM Customer WHERE CustomerId IN 
( SELECT CustomerId FROM Sales );

BEGIN RECOMMENDED STATS FOR THIS Relation->
      -- "COLLECT STATISTICS COLUMN (CustomerId) ON DWHPRO.Customer" 
      (High Confidence)
      -- "COLLECT STATISTICS COLUMN ( CustomerId ) ON DWHPRO.Sales" 
      (High Confidence)
<-END RECOMMENDED STATS

Determining the efficacy of the suggested statistics is at our discretion. We may opt to incorporate the statistics incrementally and promptly evaluate the query after each adjustment to assess any improvements. Alternatively, we could incorporate all recommended statistics at once and subsequently eliminate them iteratively, retaining only those that result in a superior execution plan.

My Approach

I add statistics with “high confidence” and primarily single-column statistics to quickly obtain results. Then, I assess the suggested multi-column statistics with “high confidence,” and only after that do I consider statistics proposals with “medium confidence.” Blindly accepting all optimizer suggestions is not advisable; statistics should only be added to enhance the execution plan. A query test run and comparison of required resources before and after the changes can be useful for evaluating the effectiveness of changes. One approach is to use a specific query band for each test, analyzing how changes affect CPU seconds, skew, and IOs in the table DBC.DBQLOGTBL. Experience has shown that redundant statistics do not impact the plan’s quality. Although DIAGNOSTIC HELPSTATS ON FOR SESSION provides an initial overview, it cannot replace detailed analysis later.

Traps When Using DIAGNOSTIC HELPSTATS ON FOR SESSION

Not all proposed statistics enhance the execution plan. Some recommendations may be redundant, incurring only additional costs for data collection without impacting the execution plan.

DIAGNOSTIC HELPSTATS ON FOR SESSION has the drawback of solely indicating the statistics to gather without specifying the method (full or sample); thus, further scrutiny is required to obtain comprehensive information.

What Is the Teradata Command HELPSTATS ON FOR SESSION?

Should statistics always be collected for all proposals?

No, because collecting statistics consumes resources. We only collect statistics if they improve our queries.

Does this command give us information about which statistics are more important than others?

Yes, by including the confidence level: “high confidence”, “medium confidence”, “low confidence”.

How do we know what kind of statistics to create?

The Optimizer does not provide this information. So it is up to us to determine if a column is skewed and then choose the appropriate type of statistic.

Does DIAGNOSTIC HELPSTATS ON FOR SESSION provide all required statistics for a table?

No, the proposed statistics only apply to the SQL query in question.

Can I use DIAGNOSTIC HELPSTATS ON FOR SESSION for Volatile Tables?

Teradata does not store the definition of volatile tables in the data dictionary. The optimizer’s suggestions help add statistics to volatile tables. However, the same recommendations are displayed again after collecting the statistics because this information is not available in DBC.STATSV.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

2 thoughts on “Optimizing Performance in Teradata with DIAGNOSTIC HELPSTATS ON FOR SESSION Command”

Alexey Usman

07/14/2022 at 1:40 pm

I usually also run these commands along with
DIAGNOSTIC VERBOSEEXPLAIN ON FOR SESSION
to get a recommendation for collecting statistics for a specific step.
- DWH Pro Admin
  
  07/14/2022 at 8:47 pm
  
  Thanks for this hint. I will add the command to the article.

Optimizing Performance in Teradata with DIAGNOSTIC HELPSTATS ON FOR SESSION Command