Optimizing Performance With Teradata Summary Statistics

Teradata summary statistics were introduced in Release 14 to give the Optimizer vital information on table row counts. These statistics are valuable for performance tuning as they can be efficiently gathered and are crucial in creating an execution plan. They are especially useful when collecting column statistics requires significant resources. The syntax for collecting summary statistics is simple:

COLLECT SUMMARY STATISTICS ON <TABLE>;

Teradata summary statistics include the number of rows, average data block size, and average row size per table. The Teradata Optimizer heavily relies on the number of rows per table to approximate the cost of scanning a table, underscoring its critical importance.

In previous Teradata versions lacking summary statistics, gathering statistics on the “PARTITION” dummy column and primary index columns was imperative.

When summary statistics are defined for a table, they are refreshed whenever any column statistics are updated. To optimize resource usage, it is recommended to update all column statistics at once, as shown below:

COLLECT STATISTICS ON <TABLE>; -- all column statistics and summary statistics are refreshed.

Although less resource-intensive than traditional column/index statistics, one should still avoid the following approach:

COLLECT STATISTICS ON <TABLE> <COLUMN_1>; -- first collection of summary statistics
...
COLLECT STATISTICS ON <TABLE> <COLUMN_N>; -- needless nth collection of summary statistics

Teradata summary statistics serve as a great starting point for gathering table-level statistics. When trying to update table statistics for a table lacking any column statistics, an error will arise:

COLLECT STATISTICS ON <TABLE>;
COLLECT failed. [3624] There are no statistics defined for the table.

Efficiently generating preliminary summary statistics for all tables can resolve this issue.

COLLECT SUMMARY STATISTICS ON <TABLE>;

COLLECT STATISTICS ON <TABLE>;
Collect complete. 1 rows processed

Teradata Statistics FAQ

What are Teradata statistics used for?

Teradata statistics are essential for the optimizer to generate efficient and accurate query execution plans. By collecting statistics on tables, columns, and indexes, the optimizer can better understand the data distribution, cardinality, and uniqueness within the database. This information helps the optimizer make informed decisions about join methods, access paths, and resource allocation when processing queries. In essence, Teradata statistics are crucial for improving query performance and ensuring optimal system resource usage.

How to show statistics in Teradata?

To show statistics in Teradata, you can use the HELP STATISTICS command. This command displays the available statistics for a specific table, index, or column. The syntax for the HELP STATISTICS command is as follows:

HELP STATISTICS database_name.table_name;

Replace database_name and table_name with the appropriate database and table names for which you want to display the statistics. You can also specify a particular column or index by appending it to the table name like this:

HELP STATISTICS database_name.table_name.column_name;

How often do you collect statistics in Teradata?

The frequency of collecting statistics in Teradata depends on various factors, such as the nature of the data, the rate of data changes, and the impact on query performance. As a general guideline, you should consider collecting statistics when:

A table is newly created or loaded with data.
A significant amount of data has been added, updated, or deleted (typically around 10% or more of the table’s total data).
You observe a decline in query performance or the optimizer starts generating suboptimal execution plans.

You may need to collect statistics more often for tables with frequently changing data. Conversely, you can collect statistics less frequently for tables with relatively static data.

Why do we collect stats in Teradata?

Collecting statistics in Teradata is essential for maintaining optimal query performance and resource utilization. By gathering information on data distribution, cardinality, and uniqueness, the optimizer can make informed decisions when creating query execution plans. Collecting statistics helps the optimizer:

Choose the most efficient join methods and access paths for the given data.
Estimate the number of rows each operation will process and return, allowing for better resource allocation and parallelism.
Determine the cost of different execution plan options, leading to selecting the most efficient plan.

In summary, collecting stats in Teradata is critical for ensuring efficient query execution and maximizing system performance.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

4 thoughts on “Optimizing Performance with Teradata Summary Statistics”

Roland Wenzlofsky

03/30/2015 at 7:31 pm

Hi Sreeraj. I think this is a misunderstanding.

“All columns at once” means to collect statistics with the new syntax which allows collecting the stats on different columns of the same table at once.

Before Teradata 14 we had to collect stats column by column i.e. issue one statement per column. The only performance advantage previously possible was synchronized scanning on the spool (this was achieved by starting the collect statistics statements for one table at the same time).

“All columns at once” does not mean to collect statistics on each and every column of a table…

Roland
Sreeraj

03/30/2015 at 7:23 pm

Thank you. very informative.
Question: for TD 14 onwards, the best practice is to collect all column statistics at once instead of column by column. is this true for big tables as well? I was under the impression that we need to collect stats only for certain columns based on the PI/join and WHERE. could you please clarify?
Johne490

08/20/2014 at 6:23 am

Very informative post.Really thank you! Awesome.
Dimitrios

05/07/2014 at 11:07 pm

Hello,

This new feature has only advantages, except from one thing, the way that we have to change the procedure which collects the stats in order to use this feature(s) (plus the max value length/max internal and sample ).

It is also recommended to big tables, where collecting stats consumes a lot of io/CPU, just
Collect Summary Stats on DB.TB, after every DML transaction on it, in order to have updated summary stats on the table, in this way we help the parser to make better extrapolations.

Optimizing Performance with Teradata Summary Statistics

Teradata Statistics FAQ

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

4 thoughts on “Optimizing Performance with Teradata Summary Statistics”

Leave a Comment Cancel reply