Teradata Data Loss Protection: Transient Journal, Fallback, RAID 1, and Clique

Roland Wenzlofsky

April 21, 2023

minutes reading time

Teradata offers a range of features to safeguard against potential data loss, each providing distinct and granular protection. While some features are enabled by default, others require explicit activation.

1. The Teradata Transient Journal – Transaction Failure Protection

Transaction changes are logged in the Teradata transient journal, also known as the transaction log.

Teradata tables are typically distributed across all AMPs based on their Primary Index values, but this is not the case for the transient journal. Instead, each AMP maintains its own transient journal.

The transient journal preserves the original row, maintaining data integrity in case of row modification. This permits the system to revert the table to its initial state in the event of an error. The rows are only temporarily held in the transient journal and are deleted after successful transaction completion. The parsing engine guarantees that all AMPs finish their tasks successfully before committing any transactions.

2. Teradata Fallback – AMP Failure Protection

The transient journal safeguards your data solely from transaction failures. However, in the event of an AMP failure, your data may become inaccessible until the AMP is restored. The fallback protection strategy was introduced to guarantee the accessibility of data in such circumstances.

Each row is stored twice for fallback protection, with a secondary hash map designating each row as an AMP and preserving a duplicate. Tables with fallback protection necessitate double the space.

Fallback protection allows for data accessibility in the event of an AMP failure because the auxiliary AMP will perform its tasks until the failed AMP is restored.

Your table will only become inaccessible if the backup AMP fails.

AMPs are organized into clusters to provide optimal protection. Each cluster consists of a primary AMP and a fallback AMP, which safeguard each other.

The primary and fallback AMPs are not stored together on the same physical node, which is a wise hardware design decision to prevent failures. If a node completely fails, the fallback protection allows access to the data.

3. Raid 1 – Disk Failure Protection

Fallback protection safeguards against failure of the Unix processes (namely, VPROC and AMP) in the Teradata System.

Teradata implements RAID 1 mirroring for data protection against disk failures. Each AMP is assigned one virtual disk (VDISK) comprising multiple physical disks. Half of these disks store the mirrored data.

Like with fallback protection, using this feature incurs double the disk space usage.

Implementing a RAID 1 configuration can optimize data accessibility and security. In the event of a disk failure, the mirror disk would serve as a backup, preventing any data loss. However, data may be permanently lost if both the primary and mirror disks fail and no fallback protection is in place.

4. The Teradata Clique – Node Failure Protection

The Clique configuration provides an additional layer of security to the Teradata System by protecting against node failures. Group nodes, called Cliques, are formed to cluster AMPs for improved security.

The AMPs from the unsuccessful node shall be transferred to another node within the same clique and remain fully operational. However, the recipient node will incur a heavier workload, leading to a detrimental effect on performance.

Teradata provides hot standby nodes to overcome this restriction. These nodes take over the AMPs in case of a failed node. As the AMPs are not involved in routine operations, there is no performance degradation.

Before the AMP migration, a system reboot and another restart are necessary once the faulty node resumes operation.