A Teradata Transient Journal safeguards transactions against system failures and deadlocks by serving as a system-wide dictionary table. Each AMP has its own local Transient Journal that stores table rows before a transaction modifies them. If a transaction aborts, the Transient Journal comes into play, restoring the contents of the affected table rows to roll back the changes. Once a transaction completes, the related rows are removed from the Transient Journal. Though short-lived compared to a Permanent Journal, the Transient Journal’s impact on performance can be significant: transactions that update billions of rows require equal space for journal entries. It is impossible to manually disable the Transient Journal since doing so would compromise the ACID properties of a transaction.
Users do not have access to the transient journal, which is located in the DBC database. Ensuring sufficient permanent space in the DBC database is essential to prevent system-level errors.
Table rows and their associated transient journal rows are consistently located on the same AMP. The distribution of these rows follows the customary hashing algorithm.
The transient journal facilitates the rollback of changes in the event of an error or manual transaction abortion. It writes altered rows back to the base table, ensuring data integrity. Such a feature is indispensable in a relational database system like Teradata.
The utilization of the transient journal has a negative impact on performance. Teradata, however, employs techniques to mitigate journaling and circumvent the row-by-row logging mechanism.
Processing large tables using DML statements like UPDATE, DELETE, or INSERT can result in poor performance due to the maintenance of the transient journal. This is because each altered row is duplicated in the journal table, increasing the workload. Additionally, the transient journal is stored in the DBC database, which can lead to permanent space loss. If any workloads depend on the transient journal, they will be terminated, and clients will receive error messages.
A skewed base table can magnify the Transient Journal as it will also become skewed. In the event of a failed load into a skewed table, the rollback can be prolonged and only resolved through forced termination, resulting in an inconsistent table. It is important to note that the Transient Journal operates at the AMP level, requiring the AMPs containing the highest number of rows to handle most of the workload during a rollback on a skewed table.
Teradata provides methods to circumvent the transient journal or store a single row in it solely.
There are situations where a row-by-row rollback is unnecessary. Here, I will present techniques that can enhance efficiency and prevent problems.
INSERT into an empty table.
An example of how modifying a query can circumvent the need for the transient journal is provided below. Initially, we construct a statement that necessitates journaling and registers the alterations row by row. While the first INSERT INTO statement does not demand row-level journaling, the second must rely on it:
INSERT INTO target SELECT col FROM table1;
INSERT INTO target SELECT col FROM table2;
To avoid using the transient journal for both INSERT INTO statements, we can rewrite the query using a UNION:
INSERT INTO target SELECT col FROM table1 UNION ALL SELECT col FROM table2;
Replace UPDATE with INSERT INTO … SELECT into a copy of the target table
It is advisable to refrain from using UPDATE statements on heavily imbalanced tables because they are not optimized for Teradata’s parallel architecture and tend to have poor performance.
UPDATE target SET col2 = 1; -- assumption: col2 is skewed
This method is superior because it bypasses the transient journal, mitigating the possibility of a precarious and protracted rollback. By writing to an empty table, the transient journal is not utilized. Only a single row indicates that the table can be completely deleted in the event of a rollback.
INSERT INTO copy_of_target SELECT col1, 1 AS col2 FROM target;
DROP TABLE target;
RENAME TABLE copy_of_target TO target;
A BTEQ trick
To avoid the transient journal in BTEQ, insert a semicolon at the start of each line for multiple INSERT statements in one table using the INSERT INTO statement. Example:
INSERT INTO target SELECT * FROM table1 -- The semicolon starts on the following line!
; INSERT INTO target SELECT * FROM table2 -- The semicolon has to be the first character!
Load Utilities and the Teradata Transient Journal
Teradata’s bulk load utilities (TPT Bulk Load, Fastload, Multiload) are optimal for loading vast amounts of data, outperforming transactional loads like BTEQ or TPump. To learn more about the scenarios in which bulk loads outshine transactional loads, refer to the following article:
HO To update a huge table means out of 10 billion records i want to 1 milion records?
Valuable update on TJ….
Can we manually purge the TJ, there is no/less activity but it is consuming more than 2.5TB of space
Regards
Ashok
Hi Roland,
I suspect that a MERGE statement is also lower in TJ or at least more efficient. The orange book seems to hint at that, but only compares multi-load apply and ANSI merge.
Regards, Johannes