fbpx

Teradata DISTINCT vs. GROUP BY

By Roland Wenzlofsky

March 10, 2014


From a result set point of view, it does not matter if you use DISTINCT or GROUP BY in Teradata. The answer set will be the same.

From a performance point of view, it is not the same.

To understand what impacts performance, you need to know what happens on Teradata when executing a statement with DISTINCT or GROUP BY.

In the case of DISTINCT, the rows are redistributed immediately without any preaggregation taking place, while in the case of GROUP BY, in a first step, a preaggregation is done, and only then are the unique values redistributed across the AMPs.

Don’t get fooled by my statement above and think now that GROUP BY is always better from a performance point of view. When you have many different values, the preaggregation step of GROUP BY is not very efficient. Teradata has to sort the data to remove duplicates. In this case, it may be better to the redistribution first, i.e., use the DISTINCT statement. Only if there are many duplicate values, the GROUP BY statement is probably the better choice as only once the deduplication step takes place after redistribution.

In short, DISTINCT vs. GROUP BY in Teradata means:

  • GROUP BY   ->  for many duplicates
  • DISTINCT    -> no or a few duplicates only

At times, when using DISTINCT, you run out of spool space on an AMP.  The reason is that redistribution takes place immediately, and skewing could cause AMPs to run out of space.

If this happens, you probably have a better chance with GROUP BY, as duplicates are already removed in a first step, and less data is moved across the AMPs.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Avatar
    MALIK HASSAN QAYYUM says:

    In case of Distinct Clause, Rows are distributed immediately without any pre-aggregation. I got this point. I know what redistribution is and how it happens behind the scene.

    In the case of GROUP BY, in a first step a preaggregation is done and only then are the unique values redistributed across the AMPs.

    I am confused what really preaggregation are here? and How they are done? Any Explanation Thanks.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >