In this blog post, I will show you what you need to pay special attention to when switching from Teradata to Snowflake.

We will analyze the impact of running DML statements (e.g. UPDATE or DELETE) in Snowflake and contrast it with Teradata’s Row Storage and Column Storage.

What happens when Teradata Rows are deleted from a table that is not Column-Partitioned?

When deleting, the rows in the data blocks are released. Teradata uses an array of pointers in each data block that points to each row’s physical position within the data block.

Data blocks are merged if too few rows remain in a data block.

An UPDATE statement causes the data block to be assembled in memory.
Depending on the effect of the update on the used space within a data block, it is split into several blocks or merged.

What happens when Teradata Rows are deleted from a table that is Column-Partitioned?

If all rows or an entire row partition of a table is deleted, they are physically deleted.

Otherwise, the rows to be deleted are only marked with a delete flag but continue to occupy space, which is necessary to keep the column containers synchronous.

The old Row-Based Engine comes into play during the UPDATE:
It selects the needed rows and assembles the column containers into rows. The rows to be changed will be marked with a delete flag but not deleted. The changed rows are transformed into column format and saved.

What happens when data is deleted or modified in Snowflake?

Snowflake stores data in micro partitions between 5 and 500 megabytes.

Every change (UPDATE, INSERT or DELETE) to a micro partition means that Snowflake must replace the entire micro partition.

Making changes in an existing micro partition is impossible because Amazon S3 does not implement this.

Micro partitions are created when loading an empty table. If data is added, new micro partitions are created.

If a change through an UPDATE or DELETE statement is necessary, Snowflake deletes the existing micro partition and replaces it with a new one.

I assume that micro partitions will not be deleted immediately but will remain available for the Time Travel feature for a while.

How can the performance of both systems be assessed?

Currently, I am missing more details about the implementation in Snowflake. But I will stay on it and inform you as soon as I know more!

  • When an update is performed in Snowflake, the file belonging to this row is copied to a new one with the changes. Now you have 2 files. In the FoundationDB database, each table has a list of current files that belong to that table. During this update operation, the database is repointed to the new file. When a select operation is executed, this FoundationDB is checked for a current list of files, once that is retreived, the statistics on each file are applied to the query and only the files needed are read. Hope this helps.

    • Thank you very much for the explanation. Do you know maybe how joins are physically done in Snowflake? In Teradata, we can get information about all join methods, but I can’t find anything about Snowflake…

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like