What Are Zero-Copy Clones?
We all know the following situations: For certain operations, we need to make a backup of tables before making changes.
If the tables are relatively small, this is not a big issue. A copy can be made quickly, and in case of an error, it is easy to revert to the backup version of a table.
If the table is very large, it can take a long time to make a copy.
In addition, the required space is doubled. Especially at the go-live of new applications, this usually means an enormous expenditure of time and space.
In Teradata, this approach is unfortunately unavoidable.
Let’s see what other database systems do better here. As an example, I’ll take Snowflake as I think many things have been done right to solve the above-mentioned problems.
What is the basis for zero-copy clones in Snowflake?
Snowflake utilizes the limitations of Amazon S3. Data is stored in micro partitions, which are basically S3 files. Since S3 files are not changeable, every DML statement that makes changes to data requires a new S3 file to be created, which replaces the old one.
Snowflake simply does not delete the original file but keeps it for a period of time (how long depends on which edition of the database is used).
How do zero-copy clones work in Snowflake?
Snowflake takes advantage of the need to replace entire S3 files when data is changed.
If a zero-copy clone of a table is created, the clone uses no storage because it shared the original table’s existing micro-partitions when it was cloned. Only pointers are set for the cloned table, which points to the existing table’s micro partitions.
Data can then be inserted, deleted, or updated in the clone independently from the original table. Each change to the clone causes a new micro-partition which is owned by the clone.
Later changes in the original table are, of course, not taken over to the clone.
The following syntax is available in Snowflake:
CREATE TABLE CustomerCopy CLONE Customer;
What is the advantage of zero-copy clones over the traditional method of copying tables?
No additional space is required beforehand, and cloning is really fast.
Snowflake claims that a table with one terabyte can be cloned in 7 minutes on a small warehouse. I think this is where Teradata can definitely compete, although it creates a deep copy of the table.
In Snowflake, you can even create instant backups of databases in a very short time by cloning a whole database.
Is zero-copy cloning a replacement for a database backup?
Cloning is, of course, no replacement for a disaster recovery solution with a backup, which is stored redundantly.
As we have seen, many things are made easy by the possibility of creating so-called zero-copy clones (or shallow copies of tables).
As far as I know, no massive parallel shared-nothing system (such as Teradata) offers this possibility.
Could this feature even be introduced on Teradata? To be honest, I am not sure. One condition definitely would be that the original table and the clone have the same primary index. In general, I think that the internal structures (Cylinder Index, Master Index, etc.) of Teradata are unsuitable for creating shallow copies.
Do you know of another MPP system beside Snowflake that allows Zero-Copy cloning? Let me know and leave a comment!