Teradata vs. Snowflake: Why GROUP BY Performance Differs at Scale

DWH Pro Admin

September 27, 2025

minutes reading time


When migrating analytical workloads from Teradata to Snowflake, one subtle but important performance factor often gets overlooked: how the two systems handle GROUP BY operations on huge tables.

The SQL looks the same, but the execution engines behave differently. If you’ve relied on Teradata for years, you may be surprised by Snowflake’s behavior.


GROUP BY in Database Systems

Most databases implement GROUP BY with either:

  • Hash aggregation: rows are placed into an in-memory hash table keyed by the grouping columns.
  • Sort aggregation: rows are sorted, then consecutive keys are aggregated.

We assume both Teradata and Snowflake use hash aggregation as their main strategy, since this is standard in large-scale systems, though Snowflake does not disclose full internals.


Teradata’s Approach

  • Low cardinality: AMPs can aggregate locally, sending only a small set of partial results across the BYNET.
  • High cardinality: pre-aggregation is ineffective; rows are redistributed to AMPs by group key before aggregation, creating network and spool pressure.
  • Primary Index advantage: if grouping is on the PI, no redistribution is needed — aggregation is local, which is the fastest case.
  • Skew sensitivity: uneven group distribution can overload a single AMP, leading to slow queries or spool failures.

Snowflake’s Approach

Snowflake applies a consistent multi-stage process:

  1. Local partial aggregation per node.
  2. Redistribution by group key.
  3. Final aggregation.

Differences to note:

  • No PI equivalent: redistribution is almost always required.
  • Memory and spilling: intermediate results spill to local SSD if they don’t fit in memory; SSD is finite, so extreme cases can fail.
  • Scalability: warehouses can be resized to add memory and SSD capacity — a tuning lever absent in Teradata.
  • Skew handling: uneven groups still slow queries, but adaptive execution introduced in 2024 helps rebalance work across nodes.

Disk Spilling

Neither system has infinite memory:

  • Teradata spills to spool; if quotas are exceeded, queries fail.
  • Snowflake spills to SSD; usually queries just run slower, though disk exhaustion can cause further spilling to object storage.

Migration Implications

  • Low-cardinality GROUP BY or PI-based queries that ran extremely fast in Teradata may lose their edge in Snowflake, since redistribution is part of the plan.
  • High-cardinality or skewed queries that often failed in Teradata tend to run more predictably in Snowflake, though sometimes slower.
  • Scaling in Snowflake gives you a way out of memory pressure, at the cost of higher credits; Teradata offers no such flexibility.

Final Thought

Cardinality, skew, and memory limits strongly influence aggregation performance.

Teradata: optimizer picks strategies based on cardinality and PI design, but queries are highly sensitive to skew and spool quotas.
Snowflake: always redistributes and aggregates in stages, assumed to be hash-based, with adaptive execution and scaling as safety valves.

For migration teams, the lesson is simple: test your GROUP BY queries. What ran best in Teradata may behave differently in Snowflake, especially at the extremes of very low or very high cardinality.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>