fbpx

 

Summary statistics are a handy feature, which is new in Teradata Release 14. Contrary to traditional statistics, which are collected on columns or indexes, summary statistics are table-level statistics:

COLLECT SUMMARY STATISTICS ON <TABLE>;

Table level statistics include the number of rows per table, the average data block sizes, and the average row sizes.

The number of rows per table is a critical measure for the Teradata Optimizer to estimate the cost of a full table scan. So, how was this information made available in previous releases of Teradata?

In earlier versions of Teradata, this was achieved by collecting column statistics on the dummy “PARTITION” and the Primary Index columns.

Summary statistics defined on a table will be refreshed each time the statistics on any column of the table are updated. If you want to avoid wasting resources, the best practice is to update all column statistics at once instead of column by column:

The excellent solution is this one:

COLLECT STATISTICS ON <TABLE>; — all column statistics are refreshed. Summary statistics are only updated once.

Although less intense on resources than traditional column/index statistics, this should still be avoided:

COLLECT STATISTICS ON <TABLE> <COLUMN_1>; — first collection of summary statistics

COLLECT STATISTICS ON <TABLE> <COLUMN_2>; — needless 2nd collection of summary statistics

Conclusion: Summary statistics collection is a handy feature to refresh the number of rows information per table quickly.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Hi Sreeraj. I think this is a misunderstanding.

    “All columns at once” means to collect statistics with the new syntax which allows collecting the stats on different columns of the same table at once.

    Before Teradata 14 we had to collect stats column by column i.e. issue one statement per column. The only performance advantage previously possible was synchronized scanning on the spool (this was achieved by starting the collect statistics statements for one table at the same time).

    “All columns at once” does not mean to collect statistics on each and every column of a table…

    Roland

  • Thank you. very informative.
    Question: for TD 14 onwards, the best practice is to collect all column statistics at once instead of a column by column. is this true for big tables as well? I was under the impression that we need to collect stats only for certain columns based on the PI/join and WHERE. could you please clarify?

  • Very informative post.Really thank you! Awesome.

  • Avatar
    Dimitrios says:

    Hello,

    This new feature has only advantages, except from one thing, the way that we have to change the procedure which collects the stats in order to use this feature(s) (plus the max value length/max internal and sample ).

    It is also recommended to big tables, where collecting stats consumes a lot of io/CPU, just
    Collect Summary Stats on DB.TB, after every DML transaction on it, in order to have updated summary stats on the table, in this way we help the parser to make better extrapolations.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >