What Snowflake Does Better Than Teradata: Zero-Copy Clones
What Are Zero-Copy Clones?
We all know the following situations: For certain operations, we need to make a backup of tables before making changes.
If the tables are relatively small, this is not a big issue. A copy can be made quickly and in case of an error, it is easy to revert to the backup version of a table.
If the table is very large, it can take a long time to make a copy.
In addition, the required space is doubled. Especially at the go-live of new applications this usually means an enormous expenditure of time and space.
In Teradata, this approach is unfortunately unavoidable.
Let's see what other database systems do better here. As an example, I'll take Snowflake as I think a lot of things have been done right to solve the above-mentioned problems.
Snowflake utilizes the limitations of Amazon S3. Data is stored in micro partitions which are basically S3 files. Since S3 files are not changeable, every DML statement that makes changes to data requires a new S3 file to be created, which replaces the old one.
Snowflake simply does not delete the original file but keeps it for a period of time (how long depends on which edition of the database is used).
Snowflake takes advantage of the need to replace entire S3 files when data is changed.
If a zero-copy clone of a table is created the clone uses no storage because it shares the existing micro-partitions of the original table at the time it was cloned. For the cloned table, only pointers are set which point to the micro partitions of the existing table.
Data can then be inserted, deleted, or updated in the clone independently from the original table. Each change to the clone causes a new micro-partition which is owned by the clone.
Later changes in the original table are of course not taken over to the clone.
The following syntax is available in Snowflake:
CREATE TABLE CustomerCopy CLONE Customer;
No additional space is required beforehand and cloning is really fast.
Snowflake claims that a table with one terabyte can be cloned in 7 minutes on a small warehouse. I think this is where Teradata can definitely compete although it creates a deep copy of the table.
In Snowflake you can even create instant backups of databases in a very short time by cloning a whole database.
Cloning is, of course, no replacement for a disaster recovery solution with a backup which is stored redundantly.
As we have seen, many things are made easy by the possibility to create so-called zero-copy clones (or shallow copies of tables).
As far as I know, no massive parallel shared-nothing system (such as Teradata) offers this possibility.
Could this feature even be introduced on Teradata? To be honest, I am not sure. One condition definitely would be that the original table and the clone have the same primary index. In general, I think that the internal structures (Cylinder Index, Master Index, etc.) of Teradata are unsuitable for the creation of shallow copies.
Do you know of another MPP system beside Snowflake that allows Zero-Copy cloning? Let me know and leave a comment!