December 14

0 comments

Teradata Referential Integrity

By DWH Pro Admin

December 14, 2019


Introduction to Teradata Referential Integrity

What is Teradata Referential Integrity?

Referential integrity ensures that a value created in a foreign key (FK) column of a referencing (child) table also exists as a primary key (PK) in the referenced (parent) table. In Teradata, the parent table's column does not have to be specified explicitly as PK, but each unique NOT NULL column can be referenced by an FK.teradata referential integrity

What are the Reasons for using Teradata Referential Integrity Features?

1. Ensure data integrity and consistency by preventing that DELETE, UPDATE or INSERT statements can corrupt the database
2. No coding required to manually ensure referential integrity
3. Teradata referential integrity handling is optimized and therefore performs better than programming it yourself using SQL or another coding method.
4. Referential Integrity helps the Teradata optimizer to create better execution plans by eliminating unnecessary joins.

What are the requirements for the primary key column of the parent table?

This column must be UNIQUE. This is given if this column is either defined as a primary key or has a UNIQUE constraint, or if a unique index exists (UPI, USI).

What happens if I delete a Primary Key value from the parent table and the child table refers to this value?

This is prevented by Teradata. A Primary Key value can only be deleted or changed if no Foreign Key refers to this value.

What is the syntax for referential integrity?

CREATE TABLE Sale
(
SalesId INTEGER NOT NULL
CustomerId INTEGER REFERENCES Customer (CustmerId),
 SalesDate DATE,
Price DECIMAL(10,2))
PRIMARY INDEX (SalesId)
);

Can the foreign key columns and primary key columns of the child and parent tables have different data types?

No. Both data type and case sensitivity must be identical.

Teradata implements 3 Types of Referential Integrity

1. What is Standard Referential Integrity?

With the Standard Referential Integrity, every INSERT, DELETE, or UPDATE of a Row is checked immediately whether the referential integrity is given.

An additional reference index sub-table is required to ensure referential integrity.

If the referential integrity is violated, the statement is not executed and an error is reported.

Since the referential integrity check is row-by-row, a performance loss must be expected.

The load tools such as TPT, Fastload, and Multiload cannot be used for tables to which standard Referential Integrity is applied.

The Reference Index Sub-Table

For each referential constraint on a child table (that is the one with the FK) a reference index sub-table is required.

The reference index sub-table is used to check whether an operation (UPDATE, INSERT, or DELETE) would violate the referential integrity, and to prevent this.

The reference index sub-table is distributed to all AMPs using hashing, just like a base table.

The rowhash is determined by the foreign key value. This ensures that the index rows of a foreign key are always present at the same AMP as the primary key rows (defined by UPI, USI, or explicitly defined as PK).

Hashing the foreign key value ensures that the referential integrity check can be performed locally on each AMP without copying rows. This is an important performance aspect

A reference index row contains the following information:

  • The ROWID of the foreign key value
  • The foreign key value itself
  • The number of rows in the child table containing this foreign key value
  • The validity flag, which indicates if the foreign key value is valid (i.e. the value is available in the column of the parent table). We will discuss in the next section, how it can happen that referential integrity is not given even if it is defined.

Adding Standard Referential Integrity to a populated Table – the Creation of Referential Integrity Violations

It is possible to apply standard referential integrity to a populated table:

ALTER TABLE SALE
ADD CONSTRAINT FK_SALES FOREIGN KEY (CUSTOMER)
REFERENCES CUSTOMER(CustomerId);

The following happens if there is no primary key in the parent table for a certain foreign key value:

  • The validity flag of the reference index sub-table row is set to “false”
  • INSERT, UPDATE, or DELETE statements can still be executed, even if there are invalidated index sub-table rows, i.e. problems with the referential integrity in the child table exist.
  • No new rows with the same invalid value can be inserted!
  • An error table that has the same structure as the child table is created automatically which receives the invalid rows. The error table is called exactly like the child table but with a numeric post-fix e.g. for SALE, there is the error table SALE_0 being created.
  • If there are several foreign key constraints on a child table, an error table is created for each foreign key constraint (these are numbered accordingly).
teradata ri
Error Table Handling for Standard RI

Purge Referential Integrity Violations

As we have seen in the previous section, when you add standard referential integrity to a populated table, it may happen that the referential integrity is not fulfilled for all rows.

Here are the methods of how you can restore data integrity after adding referential integrity checking:

ALTER TABLE SALE
ADD CONSTRAINT FK_SALES FOREIGN KEY (CUSTOMER)
REFERENCES CUSTOMER(CustomerId);
  1. UPDATE SALE SET CustomerId = <ValidCustomerId> WHERE CustomerId = <InvalidCustomerId>;
  2. UPDATE SALE SET CustomerId = NULL WHERE SalesId IN (SELECT SalesId FROM SALES_0);
  3. DELETE FROM SALE WHERE SalesId IN (SELECT SalesId FROM SALES_0);
  4. INSERT INTO Customer VALUES (<MissingCustomerId>,…);

The syntax for adding Standard Referential Integrity

ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES Customer (CustomerId);

2. What is Batch Referential Integrity?

The difference to standard referential integrity is that it does not check row-by-row, but checks the entire transaction. If only one row does not pass the test, the entire transaction is rolled back.

The advantage of this method that the performance loss can be considerably lower. Teradata also does not need an index sub-table for Batch Referential Integrity.

As with Standard Referential Integrity, the loading tools TPT, Fastload, and Multiload cannot be used on tables on which Batch Referential Integrity is applied.

Since no reference index sub-table is used, the referential integrity of the transaction is checked in a different way: by joining the parent and child table.

If the referential integrity is violated by at least one row, the entire transaction is rolled back.

Adding Batch Referential Integrity to a populated Table – the Creation of Referential Integrity Violations

For populated tables, if the foreign key has violations, the ALTER TABLE
will fail with Error 3513. There is no error table written.

The syntax for adding Batch Referential Integrity – WITH CHECK OPTION

ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES WITH CHECK OPTION Customer (CustomerId);

3. What is Soft Referential Integrity?

Soft referential integrity does not ensure referential integrity. Although referential integrity between tables is defined, it is not enforced.

Why does soft referential integrity exist then? Soft referential integrity exists to help the optimizer create better execution plans without the overhead of batch or standard referential integrity.

Since no check is performed, all loading tools (Fastload, TPT, Multiload) can be used without restrictions.

With Soft Referential Integrity you promise that data integrity is given. The optimizer trusts that data integrity is given. It is possible that a wrong result will be returned for a query if this is not the case!

The Syntax for adding Batch Referential Integrity – WITH NO CHECK OPTION

ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES WITH NO CHECK OPTION Customer (CustomerId);

Referential Integrity in Comparison

Let's now compare a query over two tables between which referential integrity is defined, and consider how exactly join optimization takes place, and what the result is, depending on the type of referential integrity used.

We select only the columns of the child table (Sale) so that the optimizer can do join elimination.

SELECT s.* FROM Customer c INNER JOIN Sale s ON s.CustomerId = c.CustomerId;

No Referential Integrity is defined

In this case, there is always a join taking place (although only columns of the sale table are selected).

If there is a CustomerId in the Sale Table that does not exist in the Customer Table, these rows are lost.

Standard Referential Integrity is defined

ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES Customer (CustomerId);

No join takes place.

Rows of the Sale Table are returned, for which the CustomerId is not in the Customer Table.

Remember:
This inconsistency could have been caused by adding the referential integrity to an existing and populated table.


It is therefore always important to check if there are rows in the error table and correct the errors accordingly or the result might be unexpected.

Batch Referential Integrity is defined

ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES WITH CHECK OPTION Customer (CustomerId);

No join takes place.

Since batch referential integrity prevents referential integrity from being violated, there can be no unexpected results.

Soft Referential Integrity is defined

ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES WITH NO CHECK OPTION Customer (CustomerId);

No join takes place.

Rows of the Sale Table are returned, for which the CustomerId is not in the Customer Table.

When to use what kind of Referential Integrity?

Performance Considerations

  • Soft Referential Integrity does not cause performance deterioration. Use it if you are sure that the referential integrity of your data is ensured to help the optimizer create a better execution plan.
  • If a row-by-row INSERT, UPDATE, or DELETE occurs, the performance loss is similar for Standard and Batch Referential Integrity. Use it when mainly single rows are processed
  • If many rows are processed at once with INSERT/DELETE/UPDATE, the performance loss is usually higher with Standard Referential Integrity than with Batch Referential Integrity if no referential integrity violations exist
  • If many rows are processed at once with INSERT/DELETE/UPDATE and referential integrity violation occurs frequently, the rollback caused by the Batch Referential Integrity might cause substantial performance issues.
  • Use batch referential integrity for batch inserts of rows only if there are usually no instances of referential integrity, as the rollback in case of an error might become a performance issue.

Feature Comparison

teradata batch ri
The most important features compared

Teradata Referential Integrity – Why is it so rarely used?

In a data warehouse, it is typically not so critical if the referential integrity is not 100% given.

One would rather accept a certain inaccuracy than the performance loss caused by standard or batch referential integrity.

Soft referential integrity seems to be a solution, but only if the referential integrity of the data is 100% fulfilled.

This may involve a lot of effort in the ETL process to ensure this. But only then it makes sense to use soft referential integrity, because who can afford that queries return the wrong results?

Read also:

Teradata Documentation on this topic

DWH Pro Admin


Teradata and Data Warehouse Specialist with more than 20 years of experience

You might also like

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Never miss a good story!

 Subscribe to our newsletter to keep up with the latest trends!

>