- On which columns should statistics be collected?
- What kind of statistics should be collected (full statistics, sample statistics or is random AMP sampling sufficient)?
- How often should statistics be collected?
These were just a few considerations that need to be made to create a successful statistics framework.
It is important to note that statistics should always be optimized before considering further SQL tuning measures.
This is because collecting statistics changes the execution plan and therefore the tuning measures taken were in vain.
Fortunately, there is a command in Teradata which helps to determine which columns statistics should be collected:
What Is DIAGNOSTIC HELPSTATS ON FOR SESSION?
This command shows you when you run an EXPLAIN at the end of the execution plan all statistics which the optimizer would need to improve the plan. Here is an example:
DIAGNOSTIC HELPSTATS ON FOR SESSION; EXPLAIN SELECT * FROM Customer WHERE CustomerId IN ( SELECT CustomerId FROM Sales ); BEGIN RECOMMENDED STATS FOR THIS Relation-> -- "COLLECT STATISTICS COLUMN (CustomerId) ON DWHPRO.CUstomer" (High Confidence) -- "COLLECT STATISTICS COLUMN ( CustomerId ) ON DWHPRO. Sales" (High Confidence) <-END RECOMMENDED STATS
How to apply the proposed statics is left up to you.
For example, you can add one after the other and then test your query each time.
Another approach is to add all the stats at once and then remove them one by one if they don't improve the query.
My Personal Approach
Personally, I prefer the approach of first adding the statistics with “high confidence” and concentrating mainly on single-column statistics.
Next, I will take care of multi-column statistics with “high confidence” and only at the very end I will look at the suggestions for statistics with “medium confidence”.
Traps When Using DIAGNOSTIC HELPSTATS ON FOR SESSION
One thing you must be aware of: Not all of the proposed statistics have added value, i.e. they change the execution plan. Many of them can be redundant and have no influence on the execution plan.
When collecting statistics you have to consider that this requires a lot of resources, especially on large tables. So you should really only collect statistics that improve the execution plan.
A disadvantage that the Teradata DIAGNOSTIC HELPSTATS ON FOR SESSION has:
It gives us no information about what kind of statistics to collect (full or sample?). Therefore further analysis is needed.
What is the Teradata Command HELPSTATS ON FOR SESSION?
No, because collecting statistics requires many resources. Only collect statistics if they improve your queries.
Yes, by including the “confidence level”: “high confidence”, “medium confidence”, “low confidence”.
This information is not provided. So it is up to you to determine if a column is skewed and then choose the appropriate type of statistic.
No, the proposed statistics only apply to the SQL query in question