1

Teradata Collect Statistics Using Sample

The Teradata Collect Statistics Using Sample feature explained

Collecting sample statistics on a Teradata system helps to cut workload. If used cautiously, it allows creating as good execution plans as full statistics.

What is sample statistics? The optimizer will use a sample of 2% of the rows to get information about the data demographics. We can easily change the sample size, by running the statement below (The new sample size is valid until you log off your session):

DIAGNOSTIC “COLLECTSTATS, SAMPLESIZE=n” ON FOR SESSION;

When can sample statistics be applied? For sample statistics, it’s crucial, that the table is not skewed. Skewing can be found by executing below query:

SELECT HASHAMP (HASHBUCKET (HASHROW ())) , COUNT (*)

FROM <TABLE>

GROUP BY 1;

If the skew is small (<= 5%), sample statistics can be used. I would not be concerned about collecting full statistics on small tables, but a collection of comprehensive statistics on big tables consumes a lot of CPU and Disk IOs and should be considered as sample statistics candidates:

COLLECT STATISTICS USING SAMPLE ON <TABLE> COLUMN (<COLUMN>);

Sample statistics are not possible on GLOBAL TEMPORARY TABLES, JOIN INDICES and VOLATILE TABLES.

See also:
All you need to know about Teradata Statistics

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • […] Teradata Collect Statistics Using Sample Collect Statistics in Teradata – Part 2 Teradata Statistics Basics […]

  • >