What Are Zero-Copy Clones?
We are all familiar with creating table backups before performing specific operations.
This is not a significant problem for smaller tables as duplicates can be swiftly produced and any errors can be rectified by reverting to the backup table.
If the table is enormous, making a copy can take a long time.
In addition, the required space is doubled. Especially at the go-live of new applications, this usually means an enormous expenditure of time and space.
Unfortunately, this approach is unavoidable in Teradata.
Let’s explore more effective database systems. For instance, consider Snowflake, which successfully resolves the aforementioned issues.
What is the basis for zero-copy clones in Snowflake?
Snowflake utilizes the limitations of Amazon S3. Data is stored in micro partitions, which are S3 files. Since S3 files are not changeable, every DML statement that makes data changes requires a new S3 file to be created, which replaces the old one.
Snowflake does not delete the original file but keeps it for some time (depending on which edition of the database is used).
How do zero-copy clones work in Snowflake?
Snowflake takes advantage of the need to replace entire S3 files when data is changed.
If a zero-copy clone of a table is created, the clone uses no storage because it shared the original table’s existing micro-partitions when it was cloned. Only pointers are set for the cloned table, pointing to the existing table’s micro partitions.
Data can then be inserted, deleted, or updated independently from the original table in the clone. Each change to the clone causes a new micro-partition owned by the clone.
Later changes in the original table are, of course, not taken over to the clone.
The following syntax is available in Snowflake:
CREATE TABLE CustomerCopy CLONE Customer;
What is the advantage of zero-copy clones over the traditional method of copying tables?
No additional space is required beforehand, and cloning is fast.
Snowflake claims a table with one terabyte can be cloned in 7 minutes in a small warehouse. I think this is where Teradata can compete, although it creates a deep copy of the table.
In Snowflake, you can even create instant backups of databases in a short time by cloning a whole database.
Is zero-copy cloning a replacement for a database backup?
Cloning is no replacement for a disaster recovery solution with a backup, which is stored redundantly.
As demonstrated, creating zero-copy clones, also known as shallow copies of tables, simplifies many processes.
To my knowledge, no parallel shared-nothing systems, like Teradata, provide this capability.
Is it feasible to introduce this feature on Teradata? I am uncertain. A prerequisite is that the source table and its duplicate possess identical primary indexes. Teradata’s internal frameworks, such as the Cylinder and Master Index, are inadequate for generating superficial replicas.
Are there any other MPP systems that support Zero-Copy cloning besides Snowflake? If so, please share by leaving a comment below.