Collect Statistics in Teradata – Part 2
Collect Statistics in Teradata – The Evaluation
After collecting on every combination considered necessary and helpful, you can check the result of the collected statistics on a table by looking at
- the time it took to collect and the then prevailing circumstances
- the collection results
If the collection took longer than expected under normal or even ideal conditions, keep that in mind when you come to maintenance and scheduling. Else, keep it simple and keep up the collect combination, especially if the table is tiny and all is just a matter of seconds.
If there are collect combinations whose result sets are very close, this is an indicator that some of them do not give any surplus information on the table content.
Look at this extract from a series of collect combinations:
14/03/10 09:01:35 462,452,442 L_ID,C_DT
14/03/10 09:02:48 462,455,048 L_ID,C_DT,CLASS
L_ID and C_DT amount to 462,452,442 distinct combinations under full sampling (of that later). Adding CLASS results in the almost the same set of combinations. So there must be one and only one CLASS for almost every combination of L_ID and C_DT. Collecting on the triple adds no value as we are almost down to every single row based on L_ID and C_DT alone. We can remove this combination from our collect combinations and so save some time and resources in the future.
There is still an advantage in such an “over-collection”: if the result comes as a surprise, it can be the one trigger that starts a data quality investigation earning you the fame of being the one who found it first!