The Teradata Collect Statistics Using Sample feature explained
Collecting sample statistics on a Teradata system helps to cut workload. If used cautiously, it allows creating as good execution plans as full statistics.
What is sample statistics? The optimizer will use a sample of 2% of the rows to get information about the data demographics. We can easily change the sample size, by running the statement below (The new sample size is valid until you log off your session):
DIAGNOSTIC “COLLECTSTATS, SAMPLESIZE=n” ON FOR SESSION;
When can sample statistics be applied? For sample statistics, it’s crucial, that the table is not skewed. Skewing can be found by executing below query:
SELECT HASHAMP (HASHBUCKET (HASHROW ())) , COUNT (*)
GROUP BY 1;
If the skew is small (<= 5%), sample statistics can be used. I would not be concerned about collecting full statistics on small tables, but a collection of comprehensive statistics on big tables consumes a lot of CPU and Disk IOs and should be considered as sample statistics candidates:
COLLECT STATISTICS USING SAMPLE ON <TABLE> COLUMN (<COLUMN>);
Sample statistics are not possible on GLOBAL TEMPORARY TABLES, JOIN INDICES and VOLATILE TABLES.