Maximizing Efficiency: Teradata vs. Snowflake Partitioning

DWH Pro Admin

April 20, 2023

minutes reading time


Minimizing I/O in Teradata and Snowflake

All database systems aim to minimize physical I/O, as it remains the slowest component in query execution—even on SSD or cloud storage.


The Teradata Approach

Teradata distributes rows across AMPs based on the Primary Index, ensuring parallel access and load balancing.
Row Partitioning further reduces I/O by physically organizing rows by a partition expression—commonly date, region, or status—allowing the optimizer to read only the necessary partitions.

Choosing a partition expression aligned with common WHERE clauses or joins is critical.
However, partition elimination in Teradata applies only when predicates reference the partition columns.
To achieve pruning on multiple dimensions (e.g., date and customer_id), the table must be redefined with a multi-level partitioning scheme.

While Teradata supports Secondary Indexes and Join Indexes, these increase storage and DML overhead and are rarely used for high-volume tables.


The Snowflake Approach

Snowflake divides data into immutable micro-partitions (~16 MB), each storing columnar data plus metadata such as min/max values, null counts, and distinct counts.
This metadata allows automatic partition pruning—no explicit partition key required.

Snowflake’s advantage is that any column can benefit from pruning if its value range helps identify relevant micro-partitions.
In contrast, Teradata requires the partitioning columns to appear in the predicate.

However, Snowflake’s pruning efficiency depends on data clustering at load time.
If data is loaded in natural order (e.g., by date), pruning remains efficient.
If inserted randomly, performance can degrade until reclustering occurs.

Snowflake’s Clustering Service can automatically reorganize micro-partitions based on a defined cluster key, improving pruning but consuming compute resources.


Key Differences and Migration Guidelines

ConceptTeradataSnowflakeMigration Consideration
Partition DefinitionExplicit (requires DDL)Implicit (metadata-driven)Simplify schema; no partition keys needed
Multi-column PruningRequires multi-level partitionsAutomatic on any columnRedesign logic; fewer partition schemes
Physical LocalityRow-based, cylinder-orientedCloud object storageLoad data sorted by clustering dimension
MaintenanceManual partition designAuto pruning, optional reclusteringUse Clustering only for large, stable tables

Migration Guidelines: Teradata → Snowflake

Validate performance with SYSTEM$CLUSTERING_DEPTH and query profile partition stats.

Remove explicit ROW PARTITION BY clauses — Snowflake manages partitions automatically.

If predicates often filter by date or region, consider a CLUSTER BY (date, region) definition.

Avoid frequent DML on large clustered tables. Instead, use staging + MERGE patterns.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>