What is the command DIAGNOSTIC HELPSTATS ON FOR SESSION in Teradata?
Gathering statistics is a crucial aspect of optimizing performance.
Collecting statistics requires resources, and it matters whether they are used only once in the ETL process or cover multiple queries’ workloads simultaneously. Fortunately, Teradata’s recent versions can be configured so that the “COLLECT STATISTIC” statement runs only when the data demography changes significantly. This conditional collection enables us to add statistics to specific script locations previously impossible or useful. However, it fails to address all issues arising from decentralized statistics collection. Collecting statistics always involves a tradeoff between collection and the cost of a suboptimal execution plan for queries. Key factors to consider are:
- For which column(s) should statistics be collected?
- What kind of statistics should be collected (full statistics, sample statistics, or random AMP sampling sufficient)?
- How often should statistics be collected?
Prioritizing the gathering of statistics is crucial for successful performance tuning. This preliminary task enables the Optimizer to formulate an optimal execution plan. Occasionally, this may be the sole necessary tuning action. Teradata offers a command that displays the required statistics for the best execution plan.
What Is DIAGNOSTIC HELPSTATS ON FOR SESSION?
Enabling “DIAGNOSTIC HELPSTATS ON FOR SESSION” provides a practical option. The Optimizer displays the necessary statistics for creating the most effective execution plan at the conclusion of the Explain statement’s output text. Here is an example:
DIAGNOSTIC HELPSTATS ON FOR SESSION; EXPLAIN SELECT * FROM Customer WHERE CustomerId IN ( SELECT CustomerId FROM Sales ); BEGIN RECOMMENDED STATS FOR THIS Relation-> -- "COLLECT STATISTICS COLUMN (CustomerId) ON DWHPRO.CUstomer" (High Confidence) -- "COLLECT STATISTICS COLUMN ( CustomerId ) ON DWHPRO. Sales" (High Confidence) <-END RECOMMENDED STATS
Determining the efficacy of the suggested statics is at our discretion. We may opt to incorporate the statistics incrementally and promptly evaluate the query after each adjustment to assess any enhancements. Alternatively, we could incorporate all recommended statistics at once and subsequently eliminate them iteratively, retaining only those that result in a superior execution plan.
I add statistics with “high confidence” and primarily single-column statistics to quickly obtain results. Then, I assess the suggested multi-column statistics with “high confidence,” and only after that do I consider statistics proposals with “medium confidence.” Blindly accepting all optimizer suggestions is not advisable; statistics should only be added to enhance the execution plan. A query test run and comparison of required resources before and after the changes can be useful to evaluate the effectiveness of changes. One approach is to use a specific query band for each test, analyzing how changes affect CPU seconds, skew, and IOs in the table DBC.DBQLOGTBL. Experience has shown that redundant statistics do not impact the plan’s quality. Although DIAGNOSTIC HELPSTATS ON FOR SESSION provides an initial overview, it cannot replace a detailed analysis later.
Traps When Using DIAGNOSTIC HELPSTATS ON FOR SESSION
Not all proposed statistics enhance the execution plan. Some recommendations may be redundant, incurring only additional costs for data collection without impacting the execution plan.
DIAGNOSTIC HELPSTATS ON FOR SESSION has the drawback of solely indicating the statistics to gather without specifying the method (full or sample); thus, further scrutiny is requisite to obtain comprehensive information.
What is the Teradata Command HELPSTATS ON FOR SESSION?
Should statistics always be collected for all proposals?
No, because collecting statistics consumes resources. We only collect statistics if they improve our queries.
Does this command give us information about which statistics are more important than others?
Yes, by including the confidence level: “high confidence”, “medium confidence”, “low confidence”.
How do we know what kind of statistics to create?
The Optimizer does not provide this information. So it is up to us to determine if a column is skewed and then choose the appropriate type of statistic.
Does DIAGNOSTIC HELPSTATS ON FOR SESSION provide all required statistics for a table?
No, the proposed statistics only apply to the SQL query in question
Can I use DIAGNOSTIC HELPSTATS ON FOR SESSION for Volatile Tables?
Teradata does not store the definition of volatile tables in the data dictionary. The suggestions of the optimizer help add statistics to Volatile Tables. Still, it is so that the same recommendations are displayed again after collecting the statistics because this information is not available in DBC.STATSV.