Migrating Bulk Loads from Teradata to Snowflake

DWH Pro Admin

September 23, 2025

minutes reading time

Big files, real risks, and how not to overspend

On Snowflake, COPY INTO scales with the number of files, not the total GB. A single big file euqals one unit of work. To go fast and cheap, publish many medium parts (about 100–250 MB compressed), size the warehouse to the files in flight, and use autosuspend, and (when needed) multi-cluster for bursts.

Mindset shift: FastLoad vs. `COPY INTO`

Teradata FastLoad: One big file can still saturate the system by turning up sessions; the client feeds all AMPs in parallel.
Snowflake COPY INTO: Parallelism is file-level. In practice, one file maps to one execution slot—a single file won’t fan out across many threads. Throughput = files available concurrently × warehouse concurrency.

Consequence: If you bring a few giant files, a larger warehouse usually burns more credits without finishing faster. You must reshape files to unlock parallelism!

Shape files for parallelism

Goal: Present many independent parts so Snowflake can load them concurrently.

Target size: Treat ≤ 250 MB compressed as “small” for parallel copy; sweet spot ≈ 100–250 MB per file.
Prefer columnar: Parquet/ORC/Avro reduce bytes and parsing overhead.
Avoid extremes: Very large files starve parallelism; ultra-tiny files add orchestration overhead. Aim for a few hundreds of MB per part.

Right-size the warehouse to files in flight

Few files → X-Small/Small is typically optimal.
Hundreds of files → consider Medium/Large/XL so additional execution slots actually work.
Autosuspend/Autoresume: Enable AUTO_SUSPEND (≈ 60–120 s) and AUTO_RESUME. Batch work to avoid paying multiple minimum billing windows.

More warehouse capacity helps only if you expose enough files concurrently to occupy the extra slots.

Cost traps (and how to dodge them)

Big warehouse + few files → idle slots = more credits, no speed. Split files or size down.
Idle billing → long gaps without AUTO_SUSPEND. Batch tightly; suspend fast.
Heavy parse costs → publish Parquet upstream when possible.

Guardrail: set Resource Monitors with SUSPEND + NOTIFY so a misconfigured job can’t run away on spend.

Throughput tuning for bulk-loading

Match slots to files: Ensure each batch exposes enough files to keep the warehouse busy. If you have ~40 parts, Medium may beat Small on wall-clock; pick the cheapest combo = (duration × credits/sec) that meets your SLA.
Format matters: CSV/JSON parsing is CPU-heavier than Parquet; convert at the source when you can.

Teradata → Snowflake: what to emulate

FastLoad playbook: In Teradata you saturated parallel lanes by raising sessions; in Snowflake you achieve the same by publishing many file parts.
Keep the hot path simple: Stage first, transform later; avoid constraints and extra work at load time.
Size for the real bottleneck: In Snowflake that’s files in flight, not raw GB.

Small-loads vs. huge-loads pattern

Small files (≤ 250 MB): Run COPY on a small load warehouse. If you must launch many independent COPY statements simultaneously, enable multi-cluster on that warehouse and submit COPY in parallel sessions; Snowflake scales out clusters when there’s queuing and suspends them as the surge drains.
Huge loads (originally > 250 MB): Split first (to ≤ 250 MB parts) and size a bigger load warehouse to the number of parts you want in flight (dozens for Medium, more for Large/XL). The gain comes from more files in parallel, not from throwing a large warehouse at a single file!

Two purpose-built load warehouses—one tuned for small-file surges, one sized for large split-file batches—keep performance high and costs predictable.

Minimal, repeatable runbook

Split big sources to ≤ 250 MB compressed parts (aim 100–250 MB).
Pick warehouse size based on files available concurrently; enable AUTO_SUSPEND/RESUME.
If many jobs: enable multi-cluster and launch COPY in parallel sessions; let Snowflake scale out/back automatically.
Monitor spend: Resource Monitors; compare Small vs. Medium/Large to find the cheapest config meeting your SLA.