The Teradata Collect Statistics Using Sample feature explained
Utilizing statistical samples in a Teradata system can lessen the workload and produce execution plans equivalent to those resulting from complete statistics when implemented with care.
What are sample statistics? The optimizer utilizes a 2% row sample to gather demographic data information. The sample size can be adjusted by executing the statement below, and the new sample size will remain valid until the session ends:
DIAGNOSTIC "COLLECTSTATS, SAMPLESIZE=n" ON FOR SESSION;
When can sample statistics be applied effectively? Ensuring the table is not skewed is crucial in obtaining accurate sample statistics. To determine skewness, execute the following query:
SELECT HASHAMP (HASHBUCKET (HASHROW ()))Â , COUNT (*)
FROM <TABLE>
GROUP BY 1;
Sample statistics are adequate for skew values that are less than or equal to 5%. Full statistics are unnecessary for small tables, but for large tables, complete statistics can use up a significant amount of CPU and Disk IO resources. In these instances, utilizing sample statistics can be a viable option.
COLLECT STATISTICS USING SAMPLE ON <TABLE> COLUMN (<COLUMN>);
Global temporary tables, join indices, and volatile tables lack statistics.
For more information on Teradata Statistics, please refer to Teradata Statistics: Everything You Need to Know.
[…] Teradata Collect Statistics Using Sample […]