Teradata DIAGNOSTIC HELPSTATS ON FOR SESSION | DWHPRO

Teradata DIAGNOSTIC HELPSTATS ON FOR SESSION

Statistics are the most important topic in Teradata Performance Tuning. There are a lot of rules you should follow when creating statistics. For example:

  • On which columns should statistics be collected?
  • What kind of statistics should be collected (full statistics, sample statistics or is random AMP sampling sufficient)?
  • How often should statistics be collected?

These were just a few considerations that need to be made to create a successful statistics framework.

It is important to note that statistics should always be optimized before considering further SQL tuning measures.

This is because collecting statistics changes the execution plan and therefore the tuning measures taken were in vain.

Fortunately, there is a command in Teradata which helps to determine which columns statistics should be collected:

What Is DIAGNOSTIC HELPSTATS ON FOR SESSION?

This command shows you when you run an EXPLAIN at the end of the execution plan all statistics which the optimizer would need to improve the plan. Here is an example:

DIAGNOSTIC HELPSTATS ON FOR SESSION;
EXPLAIN SELECT * FROM Customer WHERE CustomerId IN 
( SELECT CustomerId FROM Sales );

BEGIN RECOMMENDED STATS FOR THIS Relation->
      -- "COLLECT STATISTICS COLUMN (CustomerId) ON DWHPRO.CUstomer" 
      (High Confidence)
      -- "COLLECT STATISTICS COLUMN ( CustomerId ) ON DWHPRO. Sales" 
      (High Confidence)
<-END RECOMMENDED STATS

How to apply the proposed statics is left up to you.

For example, you can add one after the other and then test your query each time.

Another approach is to add all the stats at once and then remove them one by one if they don't improve the query.

My Personal Approach

Personally, I prefer the approach of first adding the statistics with “high confidence” and concentrating mainly on single-column statistics.

Next, I will take care of multi-column statistics with “high confidence” and only at the very end I will look at the suggestions for statistics with “medium confidence”.

Traps When Using DIAGNOSTIC HELPSTATS ON FOR SESSION

One thing you must be aware of: Not all of the proposed statistics have added value, i.e. they change the execution plan. Many of them can be redundant and have no influence on the execution plan.

When collecting statistics you have to consider that this requires a lot of resources, especially on large tables. So you should really only collect statistics that improve the execution plan.

A disadvantage that the Teradata DIAGNOSTIC HELPSTATS ON FOR SESSION has:

It gives us no information about what kind of statistics to collect (full or sample?). Therefore further analysis is needed.

What is the Teradata Command HELPSTATS ON FOR SESSION?

Should statistics always be collected for all proposals?

No, because collecting statistics requires many resources. Only collect statistics if they improve your queries.

Does this command give me information about which statistics are more important?

Yes, by including the “confidence level”: “high confidence”, “medium confidence”, “low confidence”.

How do I know what kind of statistics to create?

This information is not provided. So it is up to you to determine if a column is skewed and then choose the appropriate type of statistic.

Does DIAGNOSTIC HELPSTATS ON FOR SESSION provide all required statistics for a table?

No, the proposed statistics only apply to the SQL query in question

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

>