Teradata Collect Statistics Using Sample

0
802

 

The Teradata Collect Statistics Using Sample feature explained

Collecting sample statistics on a Teradata system helps to cut workload. If used cautiously, it allows creating as good execution plans as full statistics.

What is sample statistics? The optimizer will use a sample of 2% of the rows to get information about the data demographics. We can easily change the sample size, by running the statement below (The new sample size is valid until you log off your session):

DIAGNOSTIC “COLLECTSTATS, SAMPLESIZE=n” ON FOR SESSION;

When can sample statistics be applied? For sample statistics, it’s crucial, that the table is not skewed. Skewing can be found by executing below query:

SELECT HASHAMP (HASHBUCKET (HASHROW ())) , COUNT (*)

FROM <TABLE>

GROUP BY 1;

 

If the skew is small (<= 5%), sample statistics can be used. I would not be concerned about collecting full statistics on small tables, but a collection of comprehensive statistics on big tables consumes a lot of CPU and Disk IOs and should be considered as sample statistics candidates:

COLLECT STATISTICS USING SAMPLE ON <TABLE> COLUMN (<COLUMN>);

Sample statistics are not possible on GLOBAL TEMPORARY TABLES, JOIN INDICES and VOLATILE TABLES.

See also:
All you need to know about Teradata Statistics

Our Reader Score
[Total: 7    Average: 3.9/5]
Teradata Collect Statistics Using Sample written by Roland Wenzlofsky on March 10, 2014 average rating 3.9/5 - 7 user ratings

LEAVE A REPLY

Please enter your comment!
Please enter your name here