What are the Features of the Teradata Identity Columns?
What is a Teradata Identity column?
The Teradata identity columns feature allows generating unique numbers for each row inserted into a column. Identity columns can be used for transactional or bulk inserts.
What are the key benefits of a Teradata Identity Column?
• It is the easiest way to ensure row uniqueness without the performance disadvantage of unique constraints.
• Unique identifiers can be generated within the database and ensure integrity. The creation of unique identifiers on the application-level would be much more complicated.
• As a Primary Index, the Teradata Identity columns ensure even data distribution across the AMPs.
• The identity column can be used to create primary keys of entities.
• Teradata Identity columns simplify application logic as well as the maintenance tasks for database administrators.
Which data types can be used for Identity columns?
Integer Columns (INTEGER, BIGINT, etc.), including columns of type DECIMAL without decimal places (DECIMAL(10,0)).
What are the advantages compared to other methods of generating numbers?
No error-prone generation of unique values on the application level is required. Using the identity columns as a primary index also ensures a perfect distribution of the rows across all AMPs.
Is it guaranteed that no duplicates are created?
Uniqueness is only guaranteed if identity columns are created with the options GENERATED ALWAYS and NO CYCLE. These two options will be presented later in this article.
Are identity columns supported by the load utilities (Fastload, Multiload, etc.)?
Yes, identity columns are supported for both transactional inserts and bulk loads.
Are the numbers always nicely ascending and without gaps?
No, because the way the numbers are generated for performance reasons does not allow this. If you need ascending numbers without gaps, ROW_NUMBER() is more suitable.
What is the syntax for creating a table with Identity Column?
Here is an example:
CREATE TABLE Customer
CustomerId BIGINT GENERATED ALWAYS AS IDENTITY
(START WITH 1 INCREMENT BY 1 MINVALUE 100 NO CYCLE),
) PRIMARY INDEX (CustomerId);
What is one of the main applications for identity columns?
Creating Surrogate Keys.
What happens in the case of NO CYCLE when MAXVALUE is reached?
This error is reported: “*** Failure 7545 Numbering for Identity Column Customer.CustomerId is over its limit.”
Where can I see the value range of an identity column (current value, maximum value, minimum value, etc.)?
These values are stored in table DBC.IDCOL:
CREATE SET TABLE DBC.IDCOL
TableId BYTE(6) NOT NULL,
DatabaseId BYTE(4) NOT NULL,
AvailValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
StartValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
MinValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
MaxValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
Increment INTEGER FORMAT ‘–,—,—,–9’ NOT NULL,
Cyc CHAR(1) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC NOT NULL)
UNIQUE PRIMARY INDEX ( TableId );
Here is a query that lists all values:
CAST(dbase.databasename AS VARCHAR(255)) DatabaseName,
CAST(dbc.tvm.tvmname AS VARCHAR(255)) TableName
dbc.idcol.databaseid = dbc.tvm.databaseid
AND dbc.idcol.tableid = dbc.tvm.tvmid
tvm.DatabaseId = dbase.DatabaseId;
Wat Are The Most Important Options Of The Teradata Identity Column?
- GENERATED ALWAYS: Generate always a value, whether or not the query passes a value.
- GENERATED BY DEFAULT: Generate only a value if the column value delivered in the SQL insert statement is NULL. Used to copy rows to a table that has an identity column.
- START WITH: The first number to be used in the system-generated numeric sequence.
- INCREMENT BY: The step size by which each generated number gets incremented (one by default).
- MINVALUE: The minimum value to which a generated number can decrement. If MINVALUE is not specified -2,147,483,647 is used (the minimum value for INTEGER).
- MAXVALUE: The maximum value by which a generated number can increment.
- CYCLE: Defines whether generated values can be reused when their minimum or maximum is reached.
The Teradata Identity Column and Performance
When initially bulk-loading into a table with an Identity Column, there is some overhead involved, as each VPROC has to get a range of numbers from DBC.IdCol, and add them to its local cache.
But this overhead is only happening initially. The later handling of numbers is done from the cache.
Only when a VPROC runs out of numbers it will get another range of numbers. Creating a few thousand identity column values takes a few seconds.
Bulk insert performance can be optimized by increasing the DBSControl setting IdColBatchSize, which decreases the number of times the DBC.IdCol table has to be accessed to get a new range of numbers.
The performance improvement results from the fact that there are fewer or no updates on the DBC.IdCol table.
But we have to consider that due to larger batches many more numbers are lost during loading (or during a system restart). The optimal batch size should therefore be defined depending on the number of AMPs.
How are Teradata Identity Columns Generated?
How the Teradata identity columns’ values are generated depends on how the rows are inserted into a table.
- INSERT…SELECT: The values are cached on the AMPs
- Single Row or USING clause: The values are cached on the Parsing Engine (PE)
As VPROCs (AMPs or PEs) work in parallel and independently from each other, the numbers generated and inserted into a table are not in chronological order.
There can be gaps between the generated numbers. However, this will increase the likely-hood of having larger gaps in the figures in the case of a system restart, and between loads, when not all numbers are consumed.
Limitations Of Teradata Identity Columns
- Each table can only have one identity column.
*** Failure 3706 Syntax error: A table may not have more than one identity column.
- ALTER TABLE can’t add an identity column
*** ALTER TABLE Failed. 3706: Syntax error: Cannot add new Identity Column option
- Can’t be part of a composite primary or secondary index
*** Failure 5784 Illegal usage of Identity Column CustomerId.
- Can’t be defined on Join Index, Hash Index, PPI Table
*** Failure 5784 Illegal usage of Identity Column CustomerId.
- Can’t be defined on secondary value ordered index bigger than 4 Bytes:
*** Failure 5466 Error in Secondary Index DDL, Order by column is non-numeric or is more than 4 bytes.
- Can’t be defined in Global Temporary Tables or Volatile Tables:
*** CREATE TABLE Failed.  Illegal usage of Identity Column CustomerId
- Identity Columns with GENERATED ALWAYS can’t be updated.
*** UPDATE Failed.  The GENERATED ALWAYS Identity Column CustomerId may not be updated.
- Can’t be defined on non-partitioned NOPI tables:
*** Failure 3706 Syntax error: Identity Column in a NOPI table is unsupported.
Here you can read the article about surrogate keys, for whose creation Teradata IDENTITY columns are often used:
Here is the link to the documentation of Teradata identity columns: https://docs.teradata.com/reader/scPHvjfglIlB8F70YliLAw/WdfF9oz3QtNau24q2XvXXA