Optimizing Workload with Teradata’s Sample Statistics Feature

Roland Wenzlofsky

April 22, 2023

minutes reading time


The Teradata Collect Statistics Using Sample feature explained

Utilizing statistical samples in a Teradata system can lessen the workload and produce execution plans equivalent to those resulting from complete statistics when implemented with care.

What are sample statistics? The optimizer utilizes a 2% row sample to gather demographic data information. The sample size can be adjusted by executing the statement below, and the new sample size will remain valid until the session ends:

DIAGNOSTIC "COLLECTSTATS, SAMPLESIZE=n" ON FOR SESSION;

When can sample statistics be applied effectively? Ensuring the table is not skewed is crucial in obtaining accurate sample statistics. To determine skewness, execute the following query:

SELECT HASHAMP (HASHBUCKET (HASHROW ())) , COUNT (*)
FROM <TABLE>
GROUP BY 1;

Sample statistics are adequate for skew values that are less than or equal to 5%. Full statistics are unnecessary for small tables, but for large tables, complete statistics can use up a significant amount of CPU and Disk IO resources. In these instances, utilizing sample statistics can be a viable option.

COLLECT STATISTICS USING SAMPLE ON <TABLE> COLUMN (<COLUMN>);

Global temporary tables, join indices, and volatile tables lack statistics.

For more information on Teradata Statistics, please refer to Teradata Statistics: Everything You Need to Know.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >