DISTINCT Vs. GROUP BY: Understanding Performance Differences Based On Data Demographics In Teradata

Over the years, numerous debates have emerged concerning the superior performance of specific statements:

SELECT <COLUMN> GROUP BY 1 or

SELECT DISTINCT <COLUMN>

Many personal experiences are often shared but tend to misattribute causality. People usually construct a single test scenario and extrapolate sweeping conclusions from it.

The speculation has ceased. Herein lies the truth:

The validity of these statements hinges on the data’s demographic composition.

Grasping the execution of each statement is crucial for discerning the appropriate use of DISTINCT versus GROUP BY in Teradata.

DISTINCT distributes data to the responsible AMPs and eliminates duplicates, while GROUP BY performs local grouping on the AMP before distributing the remaining rows.

Once the basic principles are understood, it becomes simple to identify the appropriate statement to use on a Teradata system.

Using AMP local aggregation is not beneficial if there are many unique values in the columns used for grouping. Instead, it is recommended to use the DISTINCT statement.

To reduce the number of rows transferred to the AMPs during the final aggregation step, it is advisable to employ the GROUP BY statement when there are only a handful of distinct values in the grouping columns. This scenario triggers the AMP local grouping step.

One comment:

A high skew on grouped columns can cause an “out of spool space” situation on a local AMP due to the movement of many rows to a single or few AMPs. In this particular scenario, it is recommended to use the GROUP BY statement instead of the DISTINCT statement, which is typically preferred.

I hope most of your questions have been answered. There is no clear winner between DISTINCT and GROUP BY.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

1 thought on “DISTINCT vs. GROUP BY: Understanding Performance Differences Based on Data Demographics in Teradata”

For 14.0,14.10,15, these two are basically the same thing

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

1 thought on “DISTINCT vs. GROUP BY: Understanding Performance Differences Based on Data Demographics in Teradata”

Leave a Comment Cancel reply