Isolation Levels and their Impact on Performance & Concurrency
Isolation is the property that determines how transaction integrity is visible to database users. It can ensure that transactions executed concurrently deliver the same result as if they would be executed serially.
However, the concurrency of transactions can be increased by abandoning this requirement. This can increase the performance, but at the same time, one accepts inconsistent results.
Teradata offers different levels of isolation:
Read and write locks are used in a way that users only see consistent results, but providing the lowest level of concurrency and performance.
- Read committed
Teradata Load isolation, the topic of this blog post. Allows for consistent reads while the data is being changed, without causing a lock contention.
- Read uncommitted
Dirty reads, users might see inconsistent results, but high levels of concurrency and highest performance.
In the next chapter, we will describe in detail how these levels of isolation are used in Teradata.
Introduction to Teradata Load Isolation
Transaction management takes care that as many transactions as possible can access (SELECT) and change (UPDATE, INSERT, DELETE) the same database object concurrently. By serializing the access and change operations on the same database object, data integrity is ensured.
Consider the following scenarios:
- Transaction A wants to read from a table.
- Transaction B wants to update rows in the same table.
Depending on which transaction is served as the first, one of the below serial actions will take place:
- A gets the read lock on the table, B has to wait until A finishes retrieving the rows or
- B gets the write lock; A has to wait until the update finishes.
Before Teradata 15.10, there is already a method for a transaction to read from a table while a concurrent change operation occurs: the so-called dirty read; the dirty read allows the use of uncommitted rows. In the below figure, at each point in time, the uncommitted data can be read by users (which might be rolled back):
This feature would mean for our previously mentioned scenario, that A and B can take place at the same time, but A might retrieve uncommitted rows. It will happen if the changing transaction B is rolled back after transaction A was reading the rows. The dirty read is implemented in Teradata with the LOCKING FOR ACCESS modifier:
LOCKING TheTable FOR ACCESS
SELECT * FROM TheTable WHERE columnA = 10;
Teradata implements the access modifier on table level and row level, locking only one row or a small set of rows. Here is an example of the syntax for row-level locking:
LOCKING ROW FOR ACCESS
SEL * FROM TheTable
WHERE columnA = 10;
Keep in mind that Teradata transaction management may choose to upgrade the lock level if required (such as from row-level lock to table level lock).
Access locking allows for higher performance and concurrency. It should be used whenever consistency is not a must, as read operations don’t have to wait for the change operations to finish and vice versa.
It can be ok to use access locking for highly aggregated reports but may be inadequate for frequently accessed tables that need 100% integrity all the time.
Teradata Load isolation was introduced with Teradata 15.10 and gave us the possibility to use committed rows while another transaction is changing them at the same time. Load isolation adds data integrity to the functionality the LOCKING FOR ACCESS modifier already provides.
The main reason to use load isolation is the same for dirty reads: Performance improvement by increasing the workload concurrency.
Teradata implements load isolation on table level, and it can be used in our queries by adding the new LOAD COMMITTED modifier. In contrast, “locking for access” is applied only on the query level (in your SQL statement or view definition).
Teradata added several features to support load isolation:
- A new attribute is stored in the table header, which identifies a table capable of load isolation.
- All table rows are versioned, using the internal “load id” column to find the different row versions uniquely.
- Each table with load isolation keeps the last committed “load id” values in the data dictionary.
The reading of committed row versions only is done by the system using the “last committed load id” stored in the data dictionary. INSERT, UPDATE, and DELETE statements are implemented in the following way:
- INSERT: The new row is added with the new “load id.”
- UPDATE: A new version of the row is created (and identified by the new “load id”)
- DELETE: The row is not physically deleted. Only the “load id” of the DELETE statement is added (and kept together with the “load id” of the original INSERT). These logically deleted rows are not removed automatically from the table, but Teradata introduced a new ALTER TABLE feature to execute physical deletion manually.
Teradata Load isolation is only available for regular tables. VOLATILE, ERROR, QUEUE, TEMPORARY, or GLOBAL TEMPORARY TRACE tables cannot be defined as load isolated tables.
Furthermore, the following restrictions exist:
- Column partition tables cannot be load isolated.
- Hash indexes are not available for load isolated tables.
- A compressed join index is not allowed on a load isolated table.
- Permanent journaling is not available for load isolated tables.
- Load isolated tables cannot be part of a replication group.
Teradata Load Isolation and Indexing
Teradata Load isolation can be (or is automatic) inherited by the table’s secondary indexes and join indexes, allowing concurrent index reads of committed rows.
The USI of a load isolated table always carries the commit property of the base table row in the index row. Following two statements can be used to add the USI:
CREATE UNIQUE INDEX (PK) ON TheTable;
CREATE UNIQUE INDEX (PK) WITH LOAD IDENTITY ON TheTable;
In both cases, a concurrent USI read can continue to get committed data via a single USI row access (no base table access needed).
If the NUSI of a load isolated table carries the commit property for the base table ROWIDs, the index alone can cover the query. Otherwise, the execution plan involves access to the base table. The below syntax defines behavior:
CREATE INDEX (PK) WITH LOAD IDENTITY ON TheTable; ON TheTable; -> Base table access can be avoided if NUSI is covering
CREATE INDEX (PK) WITH NO LOAD IDENTITY ON TheTable; -> Always requires a base table access
As Join indexes are quite similar to regular permanent tables, any join index defined on the top of a regular table automatically becomes a load isolated join index. No special syntax for the join index DDL is required.
Usage of Load Isolated Tables
As load isolated tables need more permanent tablespace because:
- Versions of each row are stored and
- The necessary other information (load ids) consumes eight bytes per row.
Teradata Load isolation can be turned off temporarily to avoid versioning during a considerable change operation.
In this case, exclusive locks have to be applied to avoid readers using the LOAD COMMITTED modifier reading uncommitted rows.
Teradata Load isolation is applied during table creation:
CREATE TABLE TheTable, WITH CONCURRENT ISOLATED LOADING FOR ALL
PK INTEGER NOT NULL,
) UNIQUE PRIMARY INDEX (PK);
The FOR ALL level allows INSERT, DELETE, and UPDATE operations to be load isolated.
If it’s ensured that mainly INSERT operations occur on the load isolated table, the FOR INSERT level is the right choice. No logically deleted rows are kept, and no versions of updated rows. Read performance is better when compared against fully load isolated tables.
If you want to turn off Teradata load isolation temporarily, the FOR NONE level can be used in the ALTER TABLE statement. Any permanent table can be changed from load isolated to regular and vice versa.
After the load isolated table was defined, it can be accessed by using the LOAD COMMITTED modifier:
CREATE VIEW TheTableV AS LOCKING TheTable FOR LOAD COMMITTED
SELECT * FROM TheTable WHERE columnA=10;
Read operations on this view will only return the last committed rows (identified by the system accessing the data dictionary).
If wished, the most recent – but maybe uncommitted – rows can be accessed from the load isolated table TheTable, by using the LOCKING FOR ACCESS modifier:
LOCKING TheTable FOR ACCESS
SELECT * FROM TheTable WHERE columnA=10;
Teradata Load Isolation Administrative Tasks
The system does not automatically delete logically deleted rows. A newly introduced ALTER TABLE statement has to be applied:
ALTER TABLE TheTable RELEASE DELETED ROWS;
Keep in mind that this statement requires an exclusive lock on the load isolated table. Additionally, statistics should be refreshed after the cleanup.
Teradata Load isolation allows increasing concurrency by keeping committed versions of the table rows. It is a huge step forward in Teradata transaction management (although other database vendors, such as Oracle, are using the “committed read” approach already long before Teradata introduced this feature).
We like the possibility to define Teradata load isolation on a “per table” level and the opportunity to turn off load isolations for certain operations (which might heavily affect performance and disk space).
I’m not too fond of the lack of an automatic garbage collection for deleted rows. Monitoring the load isolated tables for deleted logical rows and table space consumed by these rows seems an entirely unneeded administration task.
It is even worse: while permanent tables can quite comfortably be administrated with simple ALTER TABLE statements, join indexes have to be cleaned by using stored procedures.
I hope we will see some improvement here in the future.
For more details, you can refer to the official Teradata documentation: