Introduction to Teradata Sample Statistics
In this article, I will show you when, how, and why you can/should use Teradata Sample Statistics.
What are Teradata Sample Statistics?
Sample statistics take only a percentage of rows into account when producing estimates.
What are the Advantages of Sample Statistics?
Sample Statistics are used for performance tuning. Collecting full statistics requires that a full table scan be performed on the table. Then the rows must be sorted to count the number of different values.
Sample statistics can prevent this.
When is it worth using Sample Statistics?
If the tables are large and the resource consumption causes performance problems by collecting the statistics.
The rows should be evenly distributed across all AMPs.
Especially suitable for this are e.g., indexed columns. Below is a
When shouldn’t you consider Sample Statistics?
Full statistics should always be collected on small tables. Sample statistics are not a general substitute for full statistics! The figure shows a skewed table that is not suitable for sample statistics:
How is the Size of the Sample determined?
With SAMPLE or SYSTEM SAMPLE, the sample size is determined by the optimizer. The first time full statistics are collected, then the optimizer switches to Sample Statistics and reduces the sample size until it detects that Skew is a problem.
If you want to determine the size of the sample yourself, use COLLECT STATISTICS USING SAMPLE n PERCENT.
Do I have to specify the desired Collection Method when the Statistics are collected again?
The simple COLLECT STATISTICS without specifying the method again is sufficient. Teradata remembers the last used method (full or sample) and the sample size for the sample statistics.
When using sample statistics, the columns must have as many different values as possible. For example, a UPI fulfills this requirement, but the NUPI can also be used if it has as many distinct values as possible.
Unusable for sample statistics are skewed columns
The Syntax for Sample Statistics
Use the sample size determined by Teradata:
COLLECT STATISTICS USING SYSTEM SAMPLE COLUMN (CustomerId) ON Customer;
Use a sample size of 5% of the rows of the table customer
COLLECT STATISTICS USING SYSTEM 5 PERCENT COLUMN (CustomerId) ON Customer;