2

Teradata Statistics – the new Syntax in Teradata 14.00

Starting with Release 14.00 there are several improvements for Teradata statistics which are worth being described in detail.

First of all, you should get comfortable with the new syntax, which you should use from now on. This article will cover the new syntax and discuss its advantages which are available starting with Teradata 14.00

COLLECT STATISTICS COLUMN (A), COLUMN(B), … , COLUMN(Z) ON TABLE_NAME;

As you can see, with the new syntax you can group several collect statistics requests for one table into only one request. It is not the new syntax alone which catches our attention (you can still use the old syntax if you like, even it is not recommended) but the improvement in performance and resource usage.

Whenever you use the new syntax and are grouping together statistics collections on columns, the Teradata optimizer tries to optimize your request in a way that the table scans are being minimized. Let’s look at an example:

COLLECT STATISTICS COLUMN(A,B), COLUMN(A), COLUMN(B);

In above example, the optimizer may just need one table scan to build the statistics for column A, column B, and columns (A, B). From a technical point of view, first the statistics for (A, B) will be calculated, and the statistics for A and B will be calculated by reusing the spool which was created by calculating the statistics for (A, B).

Although this approach is not chosen always (or only starting with the second time these particular statistics are refreshed), ou will recognize the huge improvements in performance and resource usage possible. Keep in mind that the old syntax would have issued 3  table scans!

Conclusion :

It is highly recommended to rewrite your existing collect statistic statements into the new syntax!

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Avatar Sreenivas says:

    Thank you for your explanation.
    as you said the example for collect statistics new syntax :
    COLLECT STATISTICS COLUMN(A,B), COLUMN(A), COLUMN(B);

    how it would be in Old Syntax and how the optimizer will work for Old Syntax.

  • Roland Wenzlofsky Roland Wenzlofsky says:

    In previous versions, each collect statement required a separate full table scan, but synchronized scaning was always possible (i.e. using a common spool for building the statistics) by issuing the statements in parallel:

    COLLECT STATISTICS COLUMN(A);
    COLLECT STATISTICS COLUMN(B);
    COLLECT STATISTICS COLUMN(A,B);

  • >