What Are Zero-Copy Clones?
We all know the following situations: For certain operations, we need to make a backup of tables before making changes.
If the tables are relatively small, this is not a big issue. A copy can be made quickly, and in case of an error, it is easy to revert to the backup version of a table.
If the table is enormous, it can take a long time to make a copy.
In addition, the required space is doubled. Especially at the go-live of new applications, this usually means an enormous expenditure of time and space.
In Teradata, this approach is unfortunately unavoidable.
Let’s see what other database systems do better here. For example, I’ll take Snowflake as I think many things have been done right to solve the problems mentioned above.
What is the basis for zero-copy clones in Snowflake?
Snowflake utilizes the limitations of Amazon S3. Data is stored in micro partitions, which are S3 files. Since S3 files are not changeable, every DML statement that makes data changes requires a new S3 file to be created, which replaces the old one.
Snowflake does not delete the original file but keeps it for some time (depending on which edition of the database is used).
How do zero-copy clones work in Snowflake?
Snowflake takes advantage of the need to replace entire S3 files when data is changed.
If a zero-copy clone of a table is created, the clone uses no storage because it shared the original table’s existing micro-partitions when it was cloned. Only pointers are set for the cloned table, pointing to the existing table’s micro partitions.
Data can then be inserted, deleted, or updated independently from the original table in the clone. Each change to the clone causes a new micro-partition owned by the clone.
Later changes in the original table are, of course, not taken over to the clone.
The following syntax is available in Snowflake:
CREATE TABLE CustomerCopy CLONE Customer;
What is the advantage of zero-copy clones over the traditional method of copying tables?
No additional space is required beforehand, and cloning is fast.
Snowflake claims a table with one terabyte can be cloned in 7 minutes in a small warehouse. I think this is where Teradata can compete, although it creates a deep copy of the table.
In Snowflake, you can even create instant backups of databases in a short time by cloning a whole database.
Is zero-copy cloning a replacement for a database backup?
Cloning is no replacement for a disaster recovery solution with a backup, which is stored redundantly.
As we have seen, many things are made easy by the possibility of creating so-called zero-copy clones (or shallow copies of tables).
As far as I know, no massive parallel shared-nothing system (such as Teradata) offers this possibility.
Could this feature even be introduced on Teradata? I am not sure. One condition would be that the original table and the clone have the same primary index. Teradata’s internal structures (Cylinder Index, Master Index, etc.) are unsuitable for creating shallow copies.
Do you know of another MPP system besides Snowflake that allows Zero-Copy cloning? Let me know and leave a comment!