What is the command DIAGNOSTIC HELPSTATS ON FOR SESSION in Teradata?

Collecting statistics is one of the essential activities for performance tuning.

But we must take into account that collecting statistics requires resources. So it makes a difference if we collect statistics that are used only once, e.g., in the ETL process or if the collected statistics cover the workload of many queries simultaneously. Fortunately, we can configure the more recent versions of Teradata so that each time a “COLLECT STATISTIC” statement is issued, Teradata executes the collect statistics statement only if the data demography has changed significantly. Making the collection of statistics conditional allows us to incorporate statistics in certain places in our scripts that were not possible or useful. Still, it does not solve all the problems caused by the decentralized collection of statistics. Collecting statistics is always a tradeoff of collecting against the cost of a worse execution plan for our queries. Important considerations are:

  • For which, column(s) should statistics be collected?
  • What kind of statistics should be collected (full statistics, sample statistics, or random AMP sampling sufficient)?
  • How often should statistics be collected?

The collection of statistics should always be the first step in performance tuning; only then should further tuning be considered. Collecting statistics allows the Optimizer to create the best possible execution plan. Still, we should take further steps if necessary. Often the collection of statistics is the only tuning activity needed. Teradata provides a command which shows as output which statistics the Optimizer would need to create the best possible execution plan.

What Is DIAGNOSTIC HELPSTATS ON FOR SESSION?

The command “DIAGNOSTIC HELPSTATS ON FOR SESSION” enables a practical option. At the end of the output text of the Explain statement, the Optimizer displays the statistics it would need to create an optimal execution plan. Below you can see an example:

DIAGNOSTIC HELPSTATS ON FOR SESSION;
EXPLAIN SELECT * FROM Customer WHERE CustomerId IN 
( SELECT CustomerId FROM Sales );

BEGIN RECOMMENDED STATS FOR THIS Relation->
      -- "COLLECT STATISTICS COLUMN (CustomerId) ON DWHPRO.CUstomer" 
      (High Confidence)
      -- "COLLECT STATISTICS COLUMN ( CustomerId ) ON DWHPRO. Sales" 
      (High Confidence)
<-END RECOMMENDED STATS

How we test the benefits of the proposed statics is our decision. We could add one statistic after the other and check the query immediately after each change for improvements, or we could add all the suggested statistics at once and then remove them one by one, leaving only the ones which caused a better execution plan.

My Approach

To arrive at a result quickly, I add all statistics with “high confidence” and mainly single-column statistics. Next, I check the proposed multi-column statistics with “high confidence” and only at the very end do I consider the proposals for statistics with “medium confidence”. It is essential not to accept all optimizer suggestions blindly but only add statistics to improve the execution plan. A test run of the query and comparing the required resources before and after the changes are helpful. The easiest way is to use a specific queryband for each test and look in the table DBC.DBQLOGTBL how the changes affect CPU seconds, skew, and IOs. Experience has shown that the Optimizer of redundant statistics does not affect the plan’s quality. DIAGNOSTIC HELPSTATS ON FOR SESSION is a valuable tool to get a first overview but cannot replace the detailed analysis later.

Traps When Using DIAGNOSTIC HELPSTATS ON FOR SESSION

Not all of the proposed statistics will improve the execution plan. Many of these recommendations can be redundant, so they do not change the execution plan but only cause additional costs for collecting them.

A disadvantage of DIAGNOSTIC HELPSTATS ON FOR SESSION is that it only tells us which statistics to collect, but not how to (full or sample); further analysis is needed to get all the details.

What is the Teradata Command HELPSTATS ON FOR SESSION?

Should statistics always be collected for all proposals?

No, because collecting statistics consumes resources. We only collect statistics if they improve our queries.

Does this command give us information about which statistics are more important than others?

Yes, by including the confidence level: “high confidence”, “medium confidence”, “low confidence”.

How do we know what kind of statistics to create?

The Optimizer does not provide this information. So it is up to us to determine if a column is skewed and then choose the appropriate type of statistic.

Does DIAGNOSTIC HELPSTATS ON FOR SESSION provide all required statistics for a table?

No, the proposed statistics only apply to the SQL query in question

Can I use DIAGNOSTIC HELPSTATS ON FOR SESSION for Volatile Tables?

Teradata does not store the definition of volatile tables in the data dictionary. The suggestions of the optimizer help add statistics to Volatile Tables. Still, it is so that the same recommendations are displayed again after collecting the statistics because this information is not available in DBC.STATSV.

  • I usually also run these commands along with
    DIAGNOSTIC VERBOSEEXPLAIN ON FOR SESSION
    to get a recommendation for collecting statistics for a specific step.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >