Snowflake’s physical join execution is predominantly hash-based. In practice you’ll observe hash-join variants with two distributions:
- Broadcast hash joins — replicate a small build side to all workers (warehouse nodes) and join locally.
- Partitioned (shuffle) hash joins — repartition both inputs by the join key so matching keys land on the same worker and can be joined locally.
If you come from Teradata, the intent will feel familiar: both systems aim to co-locate equal keys before matching. This article explains Snowflake’s strategies, maps them to Teradata’s (including dynamic plan fragments), and shows how to recognize and mitigate skew—without favoring either platform. Other specialized strategies may exist or emerge, but what you’ll consistently observe today in plans and profiles are hash-based joins with broadcast or partitioned (shuffle) distribution.
How Snowflake’s hash joins work
- Build vs. probe. Snowflake chooses a build side (typically the smaller, filtered input), builds an in-memory hash table on the join equality key(s), and probes it with the other side.
Note: Hash joins rely on equi-join keys; non-equality predicates are applied as extra filters around the hash match. - Broadcast is chosen when the build side is genuinely tiny. Each worker joins its local chunk of the large table with a full copy of the small one, avoiding a big shuffle.
- Partitioned (shuffle) is used when neither side is tiny. Both inputs are routed by a hash of the join key so rows that should meet are processed together locally.
Runtime adaptivity (why EXPLAIN and the actual run can differ)
Snowflake can adapt some join decisions at runtime after observing actual sizes, and it can coordinate choices across right-deep join pipelines to avoid memory over-commitment. You declare the logical join; Snowflake may refine the physical strategy in flight to reduce network and improve stability. Behavior can vary by workload and release.
Teradata parallels—and differences that matter
- Duplicate vs. Broadcast
Teradata duplicates a small table to all AMPs; Snowflake broadcasts the small side to all workers. Same goal: avoid moving the big table when the other side is tiny. - Redistribute vs. Shuffle
Teradata redistributes rows to AMPs by the join hash; Snowflake shuffles rows to workers by the join hash. Same goal: co-locate equal keys for a local hash match. - PI-driven locality (Teradata strength)
When tables share an aligned Primary Index, Teradata can execute many joins locally on each AMP with little or no redistribution—a design-time lever. Snowflake chooses co-location at execution time via its distribution strategy. Different levers; same goal. - Join algorithm palette
Snowflake standardizes on hash-based execution. Teradata exposes multiple physical join methods (hash, merge, nested) and selects among them using statistics, ordering, and cost. - Dynamic execution, not just static plans (Teradata)
Beyond plan-time optimization, Teradata supports features such as Incremental Planning/Execution (plan fragments) and Dynamic EXPLAIN in certain releases/configurations. These interleave execution with feedback to refine later fragments—e.g., different join choices or partition elimination—based on observed cardinalities. In other words, Teradata also has adaptive behavior, not only fixed plans. - Fallbacks when memory is tight
Both engines can use partitioned/grace hash techniques and may spill; performance degrades versus in-memory builds, but queries still complete.
Skew happens in both systems
Skew means a small set of join-key values (“heavy hitters”) dominates. After redistribution/shuffle, one AMP/worker receives a disproportionate share and becomes the long tail.
Common causes:
- Heavy-hitter keys such as
0,UNKNOWN, or “N/A”. - Wrong build choice (the “small” side isn’t small after filters).
- Uneven filtering (one side reduces sharply, the other barely).
- Fan-out from many duplicates on the join key.
Why more hardware doesn’t fix skew: the slowest, overloaded partition dictates wall-clock time.
- Snowflake: a larger warehouse adds CPU/RAM, but the long-tail partition still gates the query.
- Teradata: adding nodes/AMPs doesn’t neutralize a single hot key that hashes to one AMP. It can improve concurrency and headroom for non-skewed steps, but the heavy hitter remains the bottleneck unless you change data shape or plan.
How to recognize skew
In Teradata
- Viewpoint shows skewed steps, AMP CPU imbalance, and spool hotspots.
- EXPLAIN highlights heavy redistribution and large estimated row counts on specific steps.
- DBQL/ResUsage quantify per-step skew, spool, and CPU usage.
In Snowflake
- Query Profile lets you verify for each join:
- The selected distribution (Broadcast vs. Partitioned).
- Exchange/repartition time before the join.
- Spills on the join step.
- Row/byte balance across partitions (watch the largest-vs-median ratio).
- Account Usage surfaces stage-level bytes/rows over time to spot recurring patterns and regressions.
Note: Query Profile isn’t a 1:1 analogue of AMP-level counters; pair it with Account Usage to track bytes/rows per stage, spills, and exchange costs over time.
Mitigating skew: shared principles and platform-specific tools
Principles that work on both platforms
- Filter early on the would-be build side so it is genuinely small.
- Project only needed columns on the build path to keep rows narrow.
- Pre-aggregate / de-duplicate on the join key to reduce fan-out.
- Split the top-N keys and UNION results. Handle the heavy hitters in a separate branch; join the rest normally. This simple pattern is effective everywhere.
Teradata-specific levers
- Statistics: keep column/join-key stats fresh so the optimizer chooses duplicate vs. redistribute wisely and can pick the most suitable physical join method.
- PRPD (Partial Redistribution/Partial Duplication): with accurate stats, the optimizer can automatically split work—duplicate heavy-hitter keys and redistribute the remainder into two tailored sub-plans, then combine results—no developer intervention required.
Snowflake-specific levers
- Adaptive join decisions at runtime: Snowflake can switch between broadcast and partitioned after observing actual sizes and can coordinate choices across right-deep pipelines to avoid memory over-commitment.
- Salting hot keys (advanced, targeted): add a small bucket only for proven heavy hitters to spread them across workers; keep bucket counts modest and roll up/remove salts immediately after the join. Validate correctness and performance before broad use.
- Right-size distribution: broadcast only when the build is comfortably tiny; otherwise accept partitioning and focus on reducing shuffle volume.
Pruning vs. distribution: Search Optimization and clustering don’t rebalance join distribution, but they can reduce the input to the join—sometimes enough that a broadcast becomes feasible. Treat them as pruning tools, not distribution tools.
Monitoring playbook (Teradata habits → Snowflake tools)
- If Viewpoint showed AMP skew and spool spikes, in Snowflake open Query Profile and examine distribution, exchange time, bytes/rows per partition, and spills.
- If EXPLAIN in Teradata said redistribute, expect Partitioned (shuffle) in Snowflake; if it said duplicate, expect Broadcast.
- If statistics guided TD plans, in Snowflake constrain the build side by construction (earlier filters, narrower rows) so the optimizer naturally picks broadcast when appropriate.
What won’t fix skew (by itself)
- Upsizing compute
- Snowflake: a bigger warehouse raises memory/CPU but leaves the long-tail partition as the bottleneck.
- Teradata: adding nodes/AMPs often helps non-skewed stages and concurrency, but not a single heavy-hitter key.
- Search Optimization / clustering (Snowflake): excellent for pruning; they don’t redistribute join inputs.
- “More parallelism” without fixing data shape: if the key distribution is the problem, more lanes just create a bigger traffic jam.
When to stop optimizing
- If the join legitimately returns a material portion of the big table (e.g., double-digit percent of rows), a partitioned (shuffle) join is usually the right answer. Don’t chase broadcast; reduce upstream scan volume and keep rows narrow.
- If changes don’t lower exchange time, spills, or the largest-partition ratio in the profile, roll them back and target the actual constraint.
Key takeaways
- Snowflake’s joins are hash-based, realized as broadcast or partitioned (shuffle) strategies with runtime adaptivity; Teradata solves the same co-location problem with duplicate and redistribute, plus a richer join palette, PI-driven locality, and dynamic plan fragments that interleave optimization and execution.
- Skew can hurt both systems. Fix the data shape first: early selectivity, narrow build rows, pre-aggregate, and treat heavy hitters specially (split & UNION everywhere; PRPD in Teradata; targeted salting in Snowflake).
- Scaling out is not a skew cure: bigger Snowflake warehouses or more Teradata nodes don’t eliminate a single heavy-hitter bottleneck; the overloaded partition still gates runtime.
- Bring your Teradata monitoring instincts: swap Viewpoint/DBQL/ResUsage + EXPLAIN for Query Profile/Account Usage + EXPLAIN, and watch distribution, exchange time, rows/bytes per partition, and spills to validate each change.
This article reflects commonly observed platform behavior as of Q3 2025; capabilities and defaults can vary by edition, region, account settings, and release.
Related Services
⚡ Need Help Optimizing Your Data Platform?
We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.
Explore Our Services →📋 Considering a Move From Teradata?
Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.
Free Migration Assessment →