December 16

0 comments

Winning Tactics For Teradata Sample Statistics

By Roland Wenzlofsky

December 16, 2019

statistics

Introduction to Teradata Sample Statistics

In this article, I will show you when, how, and why you can/should use Teradata Sample Statistics.

What are Teradata Sample Statistics?

Sample statistics take only a percentage of rows into account when producing estimates.

What are the Advantages of Sample Statistics?

Sample Statistics are used for performance tuning. Collecting full statistics requires that a full table scan be performed on the table. Then the rows must be sorted to count the number of different values.
This can be prevented by sample statistics.

When is it worth using Sample Statistics?

If the tables are large and the resource consumption causes performance problems by collecting the statistics.
The rows should be evenly distributed across all AMPs.
Especially suitable for this are e.g. indexed columns. Below is a

When shouldn't you consider Sample Statistics?

Full statistics should always be collected on small tables. Sample statistics are not a general substitute for full statistics! The figure shows a skewed table that is not suitable for sample statistics:teradata-sample-statistics-skew

How is the Size of the Sample determined?

With SAMPLE or SYSTEM SAMPLE the sample size is determined by the optimizer. The first time full statistics are collected, then the optimizer switches to Sample Statistics and reduces the sample size until it detects that Skew is a problem.
If you want to determine the size of the sample yourself, use COLLECT STATISTICS USING SAMPLE n PERCENT.

Do I have to specify the desired Collection Method when the Statistics are collected again?

The simple COLLECT STATISTICS without specifying the method again is sufficient. Teradata remembers the last used method (full or sample) as well as the sample size for the sample statistics.

When using sample statistics, it is important that the columns have as many different values as possible. A UPI, for example, fulfills this requirement, but the NUPI can also be used if it has as many distinct values as possible.

Unusable for sample statistics are skewed columns

The Syntax for Sample Statistics

Use the sample size determined by Teradata:

COLLECT STATISTICS USING SYSTEM SAMPLE COLUMN (CustomerId) ON Customer;

Use a sample size of 5% of the rows of the table customer

COLLECT STATISTICS USING SYSTEM 5 PERCENT COLUMN (CustomerId) ON Customer;

Related posts:

Roland Wenzlofsky


Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

You might also like

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Never miss a good story!

 Subscribe to our newsletter to keep up with the latest trends!

>