Introduction to Teradata Referential Integrity
What is Teradata Referential Integrity?
Referential integrity ensures that a value created in a foreign key (FK) column of a referencing (child) table also exists as a primary key (PK) in the referenced (parent) table. The parent table’s column does not have to be specified explicitly as PK in Teradata, but an FK can reference each unique NOT NULL column.
What are the Reasons for using Teradata Referential Integrity Features?
1. Ensure data integrity and consistency by preventing that DELETE, UPDATE, or INSERT statements can corrupt the database
2. No coding is required to ensure referential integrity manually
3. Teradata referential integrity handling is optimized and performs better than programming it yourself using SQL or another coding method.
4. Referential Integrity helps the Teradata optimizer to create better execution plans by eliminating unnecessary joins.
What are the requirements for the primary key column of the parent table?
This column must be UNIQUE. This is given if a column is either defined as a primary key or has a UNIQUE constraint or a unique index (UPI, USI).
What happens if I delete a Primary Key value from the parent table and the child table refers to this value?
Teradata prevents this. A Primary Key value can only be deleted or changed if no Foreign Key refers to this value.
What is the syntax for referential integrity?
CREATE TABLE Sale
(
SalesId INTEGER NOT NULL
CustomerId INTEGER REFERENCES Customer (CustmerId),
 SalesDate DATE,
Price DECIMAL(10,2))
PRIMARY INDEX (SalesId)
);
Can the foreign and primary key columns of the child and parent tables have different data types?
No. Both data type and case sensitivity must be identical.
Teradata implements 3 Types of Referential Integrity.
1. What is Standard Referential Integrity?
With the Standard Referential Integrity, every INSERT, DELETE, or UPDATE of a Row is checked immediately to determine whether the referential integrity is given.
An additional reference index sub-table is required to ensure referential integrity.
If referential integrity is violated, the statement is not executed, and an error is reported.
Since the referential integrity check is row-by-row, a performance loss must be expected.
The load tools such as TPT, Fastload, and Multiload cannot be used for tables to which standard Referential Integrity is applied.
The Reference Index Sub-Table
A reference index sub-table is required for each referential constraint on a child table (the one with the FK).
The reference index sub-table is used to check whether an operation (UPDATE, INSERT, or DELETE) would violate the referential integrity and prevent it.
The reference index sub-table is distributed to all AMPs using hashing, just like a base table.
The foreign key value determines the rowhash. This ensures that a foreign key’s index rows are always present at the same AMP as the primary key rows (defined by UPI, USI, or explicitly as PK).
Hashing the foreign key value ensures that the referential integrity check can be performed locally on each AMP without copying rows. This is an important performance aspect
A reference index row contains the following information:
- The ROWID of the foreign key value
- The foreign key value itself
- The number of rows in the child table containing this foreign key value
- The validity flag indicates if the foreign key value is valid (i.e., the value is available in the parent table column). The following section will discuss how referential integrity may not be given, even if defined.
Adding Standard Referential Integrity to a populated Table – the Creation of Referential Integrity Violations
It is possible to apply standard referential integrity to a populated table:
ALTER TABLE SALE ADD CONSTRAINT FK_SALES FOREIGN KEY (CUSTOMER) REFERENCES CUSTOMER(CustomerId);
The following happens if there is no primary key in the parent table for a particular foreign key value:
- The validity flag of the reference index sub-table row is set to “false”
- INSERT, UPDATE, or DELETE statements can still be executed, even if there are invalidated index sub-table rows, i.e., the problem with the referential integrity in the child table.
- No new rows with the same invalid value can be inserted!
- An error table with the same structure as the child table is created automatically, receiving invalid rows. The error table is called exactly like the child table, but with a numeric postfix, e.g., for SALE, there is the error table SALE_0 being created.
- If there are several foreign key constraints on a child table, an error table is created for each foreign key constraint (these are numbered accordingly).

Purge Referential Integrity Violations
As seen in the previous section, when you add standard referential integrity to a populated table, referential integrity maybe not be fulfilled for all rows.
Here are the methods of how you can restore data integrity after adding referential integrity checking:
ALTER TABLE SALE ADD CONSTRAINT FK_SALES FOREIGN KEY (CUSTOMER) REFERENCES CUSTOMER(CustomerId);
- UPDATE SALE SET CustomerId = <ValidCustomerId> WHERE CustomerId = <InvalidCustomerId>;
- UPDATE SALE SET CustomerId = NULL WHERE SalesId IN (SELECT SalesId FROM SALES_0);
- DELETE FROM SALE WHERE SalesId IN (SELECT SalesId FROM SALES_0);
- INSERT INTO Customer VALUES (<MissingCustomerId>,…);
The syntax for adding Standard Referential Integrity
ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES Customer (CustomerId);
2. What is Batch Referential Integrity?
The difference to standard referential integrity is that it does not check row-by-row but checks the entire transaction. The whole transaction is rolled back if only one row does not pass the test.
The advantage of this method is that the performance loss can be considerably lower. Teradata also does not need an index sub-table for Batch Referential Integrity.
As with Standard Referential Integrity, the loading tools TPT, Fastload, and Multiload cannot be used on Batch Referential Integrity tables.
Since no reference index sub-table is used, the transaction’s referential integrity is checked differently: by joining the parent and child table.
The entire transaction is rolled back if the referential integrity is violated by at least one row.
Adding Batch Referential Integrity to a populated Table – the Creation of Referential Integrity Violations
If the foreign key violates populated tables, the ALTER TABLE
will fail with Error 3513. There is no error table written.
The syntax for adding Batch Referential Integrity – WITH CHECK OPTION
ALTER TABLE SALE ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID) REFERENCES WITH CHECK OPTION Customer (CustomerId);
3. What is Soft Referential Integrity?
Soft referential integrity does not ensure referential integrity. Although referential integrity between tables is defined, it is not enforced.
Why does soft referential integrity exist then? Soft referential integrity exists to help the optimizer create better execution plans without the overhead of batch or standard referential integrity.
Since no check is performed, all loading tools (Fastload, TPT, Multiload) can be used without restrictions.
With Soft Referential Integrity, you promise that data integrity is given. The optimizer trusts that data integrity is given. The wrong result may be returned for a query if this is not the case!
The syntax for adding Batch Referential Integrity – WITH NO CHECK OPTION
ALTER TABLE SALE ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID) REFERENCES WITH NO CHECK OPTION Customer (CustomerId);
Referential Integrity in Comparison
Let’s now compare a query over two tables between which referential integrity is defined and consider how exactly join optimization takes place and the result, depending on the type of referential integrity used.
We select only the columns of the child table (Sale) so that the optimizer can do join elimination.
SELECT s.* FROM Customer c INNER JOIN Sale s ON s.CustomerId = c.CustomerId;
No Referential Integrity is defined.
There is always a join taking place (although only columns of the sale table are selected).
If there is a CustomerId in the Sale Table that does not exist in the Customer Table, these rows are lost.
Standard Referential Integrity is defined
ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES Customer (CustomerId);
No join takes place.
Rows of the Sale Table are returned, for which the CustomerId is not in the Customer Table.
Remember:
This inconsistency could have been caused by adding the referential integrity to an existing and populated table.
Therefore, it is always important to check if there are rows in the error table and correct the errors accordingly, or the result might be unexpected.
Batch Referential Integrity is defined.
ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES WITH CHECK OPTION Customer (CustomerId);
No join takes place.
Since batch referential integrity prevents referential integrity from being violated, there can be no unexpected results.
Soft Referential Integrity is defined.
ALTER TABLE SALE
ADD CONSTRAINT SalesFK FOREIGN KEY (CustomerID)
REFERENCES WITH NO CHECK OPTION Customer (CustomerId);
No join takes place.
Rows of the Sale Table are returned, for which the CustomerId is not in the Customer Table.
When to use what kind of Referential Integrity?
Performance Considerations
- Soft Referential Integrity does not cause performance deterioration. Use it if you are sure that your data’s referential integrity is ensured to help the optimizer create a better execution plan.
- If a row-by-row INSERT, UPDATE, or DELETE occurs, the performance loss is similar for Standard and Batch Referential Integrity. Use it when mainly single rows are processed.
- Suppose many rows are processed at once with INSERT/DELETE/UPDATE. If no referential integrity violations exist, the performance loss is usually higher with Standard Referential Integrity than with Batch Referential Integrity Integrity.
- If many rows are processed at once with INSERT/DELETE/UPDATE, and referential integrity violation occurs frequently, the rollback caused by the Batch Referential Integrity might cause substantial performance issues.
- Use batch referential integrity for batch inserts of rows only if there are usually no referential integrity instances, as the rollback in case of an error might become a performance issue.
Feature Comparison

Teradata Referential Integrity – Why is it so rarely used?
It is typically not critical in a data warehouse if the referential integrity is not 100% given.
One would rather accept an inevitable inaccuracy than the performance loss caused by standard or batch referential integrity.
Soft referential integrity seems to be a solution, but only if the data’s referential integrity is 100% fulfilled.
This may involve a lot of effort in the ETL process to ensure this. But only then does it make sense to use soft referential integrity because who can afford that queries return the wrong results?
Read also: