Teradata Vs Snowflake – Deleting And Updating

Roland Wenzlofsky

January 26, 2020

minutes reading time

In this blog post, I will show you what you need to pay special attention to when switching from Teradata to Snowflake.

We will look at the effect of executing DML statements (such as UPDATE or DELETE) in Snowflake and compare this with Teradata Row Storage and Column Storage.

What happens when Teradata Rows are deleted from a table that is not Column-Partitioned?

When deleting, the rows in the data blocks are released. Teradata uses an array of pointers in each data block that points to each row’s physical position within the data block.

Data blocks are merged if too few rows remain in a data block.

An UPDATE statement causes the data block to be assembled in memory.
Depending on the effect of the update on the used space within a data block, it is split into several blocks or merged.

What happens when Teradata Rows are deleted from a table that is Column-Partitioned?

If all rows or an entire row partition of a table is deleted, they are physically deleted.

Otherwise, the rows to be deleted are only marked with a delete flag but continue to occupy space, which is necessary to keep the column containers synchronous.

The old Row-Based Engine comes into play during the UPDATE:
It selects the needed rows and assembles the column containers into rows. The rows to be changed will be marked with a delete flag but not deleted. The changed rows are transformed into column format and saved.

What happens when data is deleted or modified in Snowflake?

Snowflake stores data in micro partitions between 5 and 500 megabytes.

Every change (UPDATE, INSERT or DELETE) to a micro partition means that Snowflake must replace the entire micro partition.

It is impossible to make changes in an existing micro partition because Amazon S3 does not implement this.

Micro partitions are created when loading an empty table. If data is added, new micro partitions are created.

If a change through an UPDATE or DELETE statement is necessary, Snowflake deletes the existing micro partition and replaces it with a new one.

I assume that micro partitions will not be deleted immediately but will remain available for the Time Travel feature for a while.

How can the performance of both systems be assessed?

Currently, I am missing more details about the implementation in Snowflake. But I will stay on it and inform you as soon as I know more!

  • When an update is performed in Snowflake, the file belonging to this row is copied to a new one with the changes. Now you have 2 files. In the FoundationDB database, each table has a list of current files that belong to that table. During this update operation, the database is repointed to the new file. When a select operation is executed, this FoundationDB is checked for a current list of files, once that is retreived, the statistics on each file are applied to the query and only the files needed are read. Hope this helps.

    • Thank you very much for the explanation. Do you know maybe how joins are physically done in Snowflake? In Teradata, we can get information about all join methods, but I can’t find anything about Snowflake…

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like