Teradata Vs. Snowflake: Why GROUP BY Performance Differs At Scale

When migrating analytical workloads from Teradata to Snowflake, one subtle but important performance factor often gets overlooked: how the two systems handle GROUP BY operations on huge tables.

The SQL looks the same, but the execution engines behave differently. If you’ve relied on Teradata for years, you may be surprised by Snowflake’s behavior.

Table of Contents

GROUP BY in Database Systems

Most databases implement GROUP BY with either:

Hash aggregation: rows are placed into an in-memory hash table keyed by the grouping columns.
Sort aggregation: rows are sorted, then consecutive keys are aggregated.

We assume both Teradata and Snowflake use hash aggregation as their main strategy, since this is standard in large-scale systems, though Snowflake does not disclose full internals.

Teradata’s Approach

Low cardinality: AMPs can aggregate locally, sending only a small set of partial results across the BYNET.
High cardinality: pre-aggregation is ineffective; rows are redistributed to AMPs by group key before aggregation, creating network and spool pressure.
Primary Index advantage: if grouping is on the PI, no redistribution is needed — aggregation is local, which is the fastest case.
Skew sensitivity: uneven group distribution can overload a single AMP, leading to slow queries or spool failures.

Snowflake’s Approach

Snowflake applies a consistent multi-stage process:

Local partial aggregation per node.
Redistribution by group key.
Final aggregation.

Differences to note:

No PI equivalent: redistribution is almost always required.
Memory and spilling: intermediate results spill to local SSD if they don’t fit in memory; SSD is finite, so extreme cases can fail.
Scalability: warehouses can be resized to add memory and SSD capacity — a tuning lever absent in Teradata.
Skew handling: uneven groups still slow queries, but adaptive execution introduced in 2024 helps rebalance work across nodes.

Disk Spilling

Neither system has infinite memory:

Teradata spills to spool; if quotas are exceeded, queries fail.
Snowflake spills to SSD; usually queries just run slower, though disk exhaustion can cause further spilling to object storage.

Migration Implications

Low-cardinality GROUP BY or PI-based queries that ran extremely fast in Teradata may lose their edge in Snowflake, since redistribution is part of the plan.
High-cardinality or skewed queries that often failed in Teradata tend to run more predictably in Snowflake, though sometimes slower.
Scaling in Snowflake gives you a way out of memory pressure, at the cost of higher credits; Teradata offers no such flexibility.

Final Thought

Cardinality, skew, and memory limits strongly influence aggregation performance.

Teradata: optimizer picks strategies based on cardinality and PI design, but queries are highly sensitive to skew and spool quotas.
Snowflake: always redistributes and aggregates in stages, assumed to be hash-based, with adaptive execution and scaling as safety valves.

For migration teams, the lesson is simple: test your GROUP BY queries. What ran best in Teradata may behave differently in Snowflake, especially at the extremes of very low or very high cardinality.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

GROUP BY in Database Systems

Teradata’s Approach

Snowflake’s Approach

Disk Spilling

Migration Implications

Final Thought

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

Leave a Comment Cancel reply