Collect Statistics in Teradata – The Decision
At this point, it is assumed you have settled all issues regarding the Teradata physical data model and the table design so that all your tables can be worked with regularly. It is now time to let Teradata know what to expect from a table before it is picked up in an SQL statement. The direction should be clear: make use of it! Provide information so that Teradata can estimate its path through a query close to reality. A free and powerful mechanism for doing this is the collection of statistics.
The Scope of the Statistics Collection
Statistics can be collected on a single column as well as on a combination thereof.
The physical table design determines part of what you collect, but apart from that, it should be driven by table usage practice.
The first set of collection options should always be chosen:
- on the dummy column PARTITION, irrespective of your table being partitioned or not
- on the Primary Index columns (in combination)
- on the Secondary Index columns (in combination)
- on the Columns used for partitioning
The second, usage-driven set of collect options consists of
- Join columns: columns that are typically used to join the table to others.
- WHERE condition columns: columns frequently used to select data subsets from the table
There are some attributes that, from our experience, do not qualify for collection and should, therefore, be omitted unless high case-to-case evidence indicates otherwise
- time and date columns in table landscapes of daily changes
- Code, type or flag columns hardly used in follow-up queries