In this blog post, I will show you what you need to pay special attention to when switching from Teradata to Snowflake.
We will look at the effect of executing DML statements (such as UPDATE or DELETE) in Snowflake and compare this with Teradata Row Storage and Column Storage.
What happens when Teradata Rows are deleted from a table that is not Column-Partitioned?
When deleting, the rows in the data blocks are released. Teradata uses an array of pointers in each data block that points to each row’s physical position within the data block.
Data blocks are merged if too few rows remain in a data block.
An UPDATE statement causes the data block to be assembled in memory.
Depending on the effect of the update on the used space within a data block, it is split into several blocks or merged.
What happens when Teradata Rows are deleted from a table that is Column-Partitioned?
If all rows or an entire row partition of a table is deleted, they are physically deleted.
Otherwise, the rows to be deleted are only marked with a delete flag but continue to occupy space, which is necessary to keep the column containers synchronous.
The old Row-Based Engine comes into play during the UPDATE:
It selects the needed rows and assembles the column containers into rows. The rows to be changed will be marked with a delete flag but not deleted. The changed rows are transformed into column format and saved.
What happens when data is deleted or modified in Snowflake?
Snowflake stores data in micro partitions between 5 and 500 megabytes.
Every change (UPDATE, INSERT or DELETE) to a micro partition means that Snowflake must replace the entire micro partition.
It is impossible to make changes in an existing micro partition because Amazon S3 does not implement this.
Micro partitions are created when loading an empty table. If data is added, new micro partitions are created.
If a change through an UPDATE or DELETE statement is necessary, Snowflake deletes the existing micro partition and replaces it with a new one.
I assume that micro partitions will not be deleted immediately but will remain available for the Time Travel feature for a while.
How can the performance of both systems be assessed?
Currently, I am missing more details about the implementation in Snowflake. But I will stay on it and inform you as soon as I know more!