In the past, there have been numerous discussions regarding which statement performs better:

SELECT <COLUMN> GROUP BY 1 or
SELECT DISTINCT <COLUMN>

Numerous personal experiences are shared but frequently misidentify cause and effect. Individuals typically design a singular test scenario and draw overarching conclusions from it.

The guessing is over now. The truth is:

The superiority of these statements is dependent on the demographics of the data.

It is important to understand how each statement is executed to determine when to use DISTINCT versus GROUP BY in Teradata.

DISTINCT distributes data to the responsible AMPs and eliminates duplicates, while GROUP BY performs local grouping on the AMP before distributing the remaining rows.

Once the fundamental principles are comprehended, it is effortless to determine which statement to employ on a Teradata system.

Using AMP local aggregation is not beneficial if there are multiple unique rows for the columns used in the grouping. Instead, it is recommended to use the DISTINCT statement.

To reduce the number of rows transferred to the AMPs during the final aggregation step, it is advisable to employ the GROUP BY statement when there are only a handful of rows in the grouping columns. This scenario triggers the AMP local grouping step.

One comment:

A high skew on grouped columns can cause an “out of spool space” situation on a local AMP due to the movement of many rows to a single or few AMPs. In this particular scenario, it is recommended to use the GROUP BY statement instead of the DISTINCT statement, which is typically preferred.

I hope most of your guesses have been resolved. There is no victor between DISTINCT and GROUP BY.

  • For 14.0,14.10,15, these two are basically the same thing

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >