A Teradata Transient Journal is a dictionary table maintained across the whole system to protect transactions against system failures and deadlocks. Each AMP manages its local Transient Journal, where the table rows are copied before being changed by a transaction. Should one transaction fail to complete (i.e., aborts), the Transient Journal is used to roll back and restore the contents of the affected table rows. The related rows are removed from the Transient Journal when the transaction completes. Despite being short-lived compared to a Permanent Journal, the Transient Journal can significantly affect performance: a transaction causing an update on billions of rows requires an equal space for journal entries. The Transient Journal cannot be manually disabled, or it would not be possible to ensure the ACID properties of a transaction.
As a user, we have no access to the transient journal. The table is located in the DBC database. But we must ensure that there is always enough permanent space in the DBC database to avoid system-level errors.
Base table rows are always together with their transient journal rows at the same AMP. The distribution of the rows is done as usual by the hashing algorithm.
In case of an error or if a transaction is aborted manually, the transient journal is used to perform a rollback of the changes. All changed rows are written back from the transient journal to the base table. This is an essential function of a relational database system. Without a transient journal, Teradata would not guarantee data integrity.
Of course, using the transient journal negatively impacts performance. Therefore, Teradata has techniques to minimize journaling and bypass the row-by-row logging mechanism.
Especially when we process huge tables (with DML statements such as UPDATE, DELETE, INSERT), the transient journal’s maintenance can negatively affect the performance because each changed row is also stored in the journal table, which doubles the workload. Since the transient journal is stored in the DBC database, we also risk running out of permanent space. All workloads relying on the transient journal will be aborted, and errors will be returned to the clients.
The impact of a skewed base table is multiplied by the Transient Journal since it will also be skewed. If, e.g., the loading into a skewed table fails, the rollback typically takes a very long time and can only be terminated by force (this often leaves an inconsistent table). Remember, the transient journal is AMP-local: For any rollback on a skewed table, the AMPs holding the most rows must do most of the work.
Teradata offers some techniques which bypass the transient journal, or only store one row in the journal.
There are scenarios where a rollback row by row is not needed. In the following, I will show you some of the tricks you can use to improve performance and avoid issues.
INSERT into an empty table.
Here is a simple example of how slightly changing a query can avoid the usage of the transient journal. First, we write a statement that needs journaling and logs the changes row by row. The first of two INSERT INTO statements will not use row-level journaling, but the second INSERT INTO statement must use it:
INSERT INTO target SELECT col FROM table1;
INSERT INTO target SELECT col FROM table2;
If we want to avoid the usage of the transient journal for both INSERT INTO statements, we can achieve this by rewriting the query as below, using a UNION:
INSERT INTO target SELECT col FROM table1 UNION ALL SELECT col FROM table2;
Replace UPDATE with INSERT INTO … SELECT into a copy of the target table
We should avoid UPDATE statements on large skewed tables, as these have poor performance. They do not make good use of Teradata’s parallel architecture.
UPDATE target SET col2 = 1; -- assumption: col2 is skewed
The method shown below is better because the transient journal is bypassed, and a risky (and possibly long-lasting) rollback can be avoided. The transient journal is not involved because we write into an empty table (only one row for the rollback is used to store the fact that the table can be entirely deleted in case of a rollback):
INSERT INTO copy_of_target SELECT col1, 1 AS col2 FROM target;
DROP TABLE target;
RENAME TABLE copy_of_target TO target;
A BTEQ trick
BTEQ allows us to avoid the transient journal if we put a semicolon for multiple INSERT statements into the same table at the beginning of each line of the INSERT INTO statement, as shown below:
INSERT INTO target SELECT * FROM table1 -- The semicolon starts on the following line!
; INSERT INTO target SELECT * FROM table2 -- The semicolon has to be the first character!
Load Utilities and the Teradata Transient Journal
All Teradata bulk load utilities (TPT Bulk Load, Fastload, Multiload) do not use the transient journal and are better for loading more significant amounts of data than transactional loads such as BTEQ or TPump. In which scenarios bulk loads perform better than transactional loads, you can read in the following article:
HO To update a huge table means out of 10 billion records i want to 1 milion records?
Valuable update on TJ….
Can we manually purge the TJ, there is no/less activity but it is consuming more than 2.5TB of space
I suspect that a MERGE statement is also lower in TJ or at least more efficient. The orange book seems to hint at that, but only compares multi-load apply and ANSI merge.