The Teradata Transient Journal

Roland Wenzlofsky

June 23, 2022

minutes reading time


Like most relational database systems, Teradata creates a copy of each row modified for most DML statements. Teradata uses the Transient Journal table, which can only be read and modified by the system. As a user, we have no access to it. The table is located in the DBC database. However, we have to ensure that there is always enough space in the DBC database. Otherwise, there may be some interference.

The Teradata Transient Journal is like most of the Teradata tables AMP-local. Each AMP manages its transient journal, i.e., the base table rows always have the transient journal rows at the same AMP. The Transient Journal is like most of the Teradata tables AMP local. Each AMP manages its transient journal, i.e., the base table rows always have the transient journal rows at the same AMP. The distribution of the rows is done as usual by the hashing algorithm, which is also used for normal tables.

In case of an error or if a transaction is aborted, the transient journal is used to perform a rollback of the changes. All changed rows are written back from the transient journal to the base table. This is an essential function of a relational database system. Therefore we cannot deactivate it. Without a transient journal, Teradata would not guarantee data integrity.

Of course, using the transient journal has a negative on performance. Therefore, there are techniques in Teradata that log only one record and thus bypass the row-by-row mechanism.

Significantly when we process large tables (UPDATE, DELETE, INSERT), the transient journal’s maintenance can negatively affect the performance because this table then contains practically as many rows as the processed base table. Since the transient journal is stored in the DBC database, there is also the risk that we do not have enough space available. All workloads relying on the transient journal will be aborted, and corresponding errors will be returned.

Finally, the effect of skewing a base table is exacerbated by the Transient Journal since it is also skewed. Finally, the impact of skewing a base table is compounded by the Transient Journal since it is also skewed. If, e.g., the loading into a skewed table breaks off, the rollback for large data sets typically takes a very long time and can only be terminated by force (this often leaves an inconsistent table). Remember, the transient journal is AMP-local: For any rollback on a skewed table, the AMPs holding the most rows must do unevenly much of the work.

Teradata offers some techniques which bypass the transient journal, or only one row has to be logged.

There are scenarios where a rollback row by row is not needed. In the following, I will show you some of them.

INSERT into an empty table.

Here is a simple example of how the transient journal can be skipped by slightly changing a query. First, the variant which needs the transient journal and logs the changes row by row. The first INSERT INTO statement will not use the transient journal but the second INSERT INTO statement has to use it:

INSERT INTO target SELECT col FROM table1;
INSERT INTO target SELECT col FROM table2;

If we want to avoid the transient journal for both INSERT INTO statements, we can achieve this with the following minor change:

INSERT INTO target SELECT col FROM table1 UNION ALL SELECT col FROM table2;

Replace UPDATE with INSERT INTO … SELECT into a copy of the target table

We should avoid direct UPDATE statements on large skewed tables, as this has poor performance and unevenly loads the parallel system:

UPDATE target SET col2 = 1;

Instead, this variant offers itself as a substitute. The advantage here is that no transient journal is needed because we write to an empty table (a row with the information for the rollback “the table was empty before” is sufficient):

INSERT INTO copy_of_target SELECT col1, 1 AS col2 FROM target;
DROP TABLE target;
RENAME TABLE copy_of_target TO target;

A BTEQ trick

BTEQ allows us to avoid the transient journal if we put the semicolon for multiple INSERT statements into the same table at the beginning of each starting line of the INSERT INTO statement as shown below:

INSERT INTO target SELECT * FROM table1  -- The semicolon starts on the following line!
; INSERT INTO target SELECT * FROM table2 -- The semicolon has to be the first character!

Load Utilities and the Teradata Transient Journal

All Teradata bulk load utilities (TPT Bulk Load, Fastload, Multiload) do not use a transient journal and are preferable to transactional loads such as BTEQ or TPump under the right circumstances. When exactly bulk loads perform better, you can read in the following article:

  • Avatar
    Smita BHAGAT says:

    HO To update a huge table means out of 10 billion records i want to 1 milion records?

  • Valuable update on TJ….

    Can we manually purge the TJ, there is no/less activity but it is consuming more than 2.5TB of space

    Regards
    Ashok

  • Avatar
    Johannes Vink says:

    Hi Roland,

    I suspect that a MERGE statement is also lower in TJ or at least more efficient. The orange book seems to hint at that, but only compares multi-load apply and ANSI merge.

    Regards, Johannes

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >