Teradata Load Isolation
Introduction to Teradata Load Isolation
Transaction management takes care that as many transactions as possible can access (SELECT) and change (UPDATE, INSERT, DELETE) the same database object concurrently. By serializing the access and change operations on the same database object, data integrity is ensured.
Consider the following scenarios:
- Transaction A wants to read from a table
- Transaction B wants to update rows in the same table
Depending which transaction is served as the first, one of below serial actions will take place:
- A gets the read lock on the table, B has to wait until A finishes retrieving the rows or
- B gets the write lock; A has to wait until the update finishes
Before Teradata 15.10, there is already a method for a transaction to read from a table while a concurrent change operation takes place: the so-called dirty read; the dirty read allows to use uncommitted rows.
This feature would mean for our previous mentioned scenario, that A and B can take place at the same time, but A might retrieve uncommitted rows. It will definitely happen if the changing transaction B is rolled back after transaction A was reading the rows. The dirty read is implemented in Teradata with the LOCKING FOR ACCESS modifier:
LOCKING TheTable FOR ACCESS
SELECT * FROM TheTable WHERE columnA = 10;
Teradata implements the access modifier on table level and on row level which locks only one row or a small set of rows. Here is an example of the syntax for row level locking:
LOCKING ROW FOR ACCESS
SEL * FROM TheTable
WHERE columnA = 10;
Keep in mind that Teradata transaction management may choose to upgrade the lock level if required (such as from row level lock to table level lock).
Access locking allows for a higher performance and should be used whenever possible: Read operations don’t have to wait for the change operations to finish and vice versa. Unfortunately, this approach is not always applicable.
It can be ok to use access locking for highly aggregated reports but may be inadequate for frequently accessed tables which need 100% integrity all the time.
Load isolation was introduced with Teradata 15.10 and gives us the possibility to use committed rows while another transaction is changing them at the same time. Load isolation adds data integrity to the functionality the LOCKING FOR ACCESS modifier already provides.
The main reason to use load isolation is, of course, the same we had for dirty reads: Performance improvement by increasing the workload concurrency.
Teradata implements load isolation on table level, and it can be used in your queries by adding the new LOAD COMMITTED modifier. In contrast, “locking for access” is applied only on query level (in your SQL statement or view definition).
Several features were added to support load isolation:
- A new attribute is stored in the table header which identifies a table capable of load isolation
- All table rows are versioned, using the internal “load id” column to be able to find the different row versions uniquely
- Each table with load isolation keeps the last committed “load id” values in the data dictionary
The reading of committed row versions only is done by the system, by using the “last committed load id” which is stored in the data dictionary. INSERT, UPDATE and DELETE statements are implemented in the following way:
- INSERT: The new row is added with the new “load id.”
- UPDATE: A new version of the row is created (and identified by the new “load id”)
- DELETE: The row is not physically deleted, only the “load id” of the DELETE statement is added (and kept together with the “load id” of the original INSERT). These logically deleted rows are not removed automatically from the table, but a new ALTER TABLE feature was introduced to execute physical deletion manually.
Load isolation is only available for regular tables. VOLATILE, ERROR, QUEUE, TEMPORARY or GLOBAL TEMPORARY TRACE tables cannot be defined as load isolated tables.
Furthermore, the following restrictions exist:
- Column partition tables cannot be load isolated.
- Hash indexes are not available for load isolated tables.
- A compressed join index is not allowed on a load isolated table.
- Permanent journaling is not available for load isolated tables.
- Load isolated tables cannot be part of a replication group.
Teradata Load Isolation and Indexing
Of course, load isolation can be (or is automatically) inherited by the table’s secondary indexes and join indexes, allowing concurrent index reads of committed rows.
The USI of a load isolated table always carries the commit property of the base table row in the index row. Following two statements can be used to add the USI:
CREATE UNIQUE INDEX (PK) ON TheTable;
CREATE UNIQUE INDEX (PK) WITH LOAD IDENTITY ON TheTable;
In both cases, a concurrent USI read can continue to get committed data via a single USI row access (no base table access needed).
If the NUSI of a load isolated table carries the commit property for the base table ROWIDs, the index alone can be used to cover the query. Otherwise, the execution plan involves access to the base table. Behaviour is defined by below syntax:
CREATE INDEX (PK) WITH LOAD IDENTITY ON TheTable; ON TheTable; -> Base table access can be avoided if NUSI is covering
CREATE INDEX (PK) WITH NO LOAD IDENTITY ON TheTable; -> Always requires a base table access
As Join indexes are quite similar to regular permanent tables, any join index defined on the top of a regular table becomes a load isolated join index automatically. No special syntax for the join index DDL is required.
Usage of Load Isolated Tables
As load isolated tables need more permanent table space because:
- Versions of each row are stored and
- The necessary other information (load ids) consumes eight bytes per row,
Load isolation can be turned off temporarily to avoid versioning during huge change operation.
In this case, exclusive locks have to be applied to avoid that readers who are using the LOAD COMMITTED modifier are reading uncommitted rows.
Load isolation is turned on during table creation:
CREATE TABLE TheTable, WITH CONCURRENT ISOLATED LOADING FOR ALL
PK INTEGER NOT NULL,
) UNIQUE PRIMARY INDEX (PK);
The FOR ALL level allows INSERT, DELETE and UPDATE operations to be load isolated.
If it’s ensured that mainly INSERT operations occur on the load isolated table, the FOR INSERT level is the right choice as no logically deleted rows are kept and no versions of updated rows. Read performance is better when compared against fully load isolated tables.
If you want to turn off load isolation temporarily, the FOR NONE level can be used in the ALTER TABLE statement. Any permanent table can be changed from load isolated to regular and vice versa.
After the load isolated table was defined, it can be accessed by using the LOAD COMMITTED modifier:
CREATE VIEW TheTableV AS LOCKING TheTable FOR LOAD COMMITTED
SELECT * FROM TheTable WHERE columnA=10;
Read operations on this view will only return the last committed rows (which are identified by the system accessing the data dictionary).
If wished, the most recent – but maybe uncommitted – rows can be accessed from the load isolated table TheTable, by using the LOCKING FOR ACCESS modifier:
LOCKING TheTable FOR ACCESS
SELECT * FROM TheTable WHERE columnA=10;
Load Isolation Administrative Tasks
The system does not automatically delete logically deleted rows. A newly introduced ALTER TABLE statement has to be applied:
ALTER TABLE TheTable RELEASE DELETED ROWS;
Keep in mind, that this statement requires an exclusive lock on the load isolated table. Additionally, statistics should be refreshed after the cleanup.
Load isolation allows increasing concurrency by keeping committed versions of the table rows and are a huge step forward in Teradata transaction management (although other database vendors are using the “committed read” approach already long before Teradata introduced this feature).
I like the possibility to define load isolation on a “per table” level and the opportunity to turn off load isolations for certain operations (which might heavily affect performance and disk space).
I don’t like the lack of an automatic garbage collection. Monitoring the load isolated tables for deleted logical rows and table space consumed by these rows seems an entirely unneeded administration task. It is even worse: while permanent tables can quite comfortable be administrated with simple ALTER TABLE statements, join indexes have to be cleaned by the usage of stored procedures.
I hope we will see some improvement here in the future