Teradata Statistics – the new Syntax in Teradata 14.00
Starting with Release 14.00 there are several improvements for Teradata statistics which are worth being described in detail.
First of all, you should get comfortable with the new syntax, which you should use from now on. This article will cover the new syntax and discuss its advantages which are available starting with Teradata 14.00
COLLECT STATISTICS COLUMN (A), COLUMN(B), … , COLUMN(Z) ON TABLE_NAME;
As you can see, with the new syntax you can group several collect statistics requests for one table into only one request. It is not the new syntax alone which catches our attention (you can still use the old syntax if you like, even it is not recommended) but the improvement in performance and resource usage.
Whenever you use the new syntax and are grouping together statistics collections on columns, the Teradata optimizer tries to optimize your request in a way that the table scans are being minimized. Let’s look at an example:
COLLECT STATISTICS COLUMN(A,B), COLUMN(A), COLUMN(B);
In above example, the optimizer may just need one table scan to build the statistics for column A, column B, and columns (A, B). From a technical point of view, first the statistics for (A, B) will be calculated, and the statistics for A and B will be calculated by reusing the spool which was created by calculating the statistics for (A, B).
Although this approach is not chosen always (or only starting with the second time these particular statistics are refreshed), ou will recognize the huge improvements in performance and resource usage possible. Keep in mind that the old syntax would have issued 3 table scans!
It is highly recommended to rewrite your existing collect statistic statements into the new syntax!