December 30


Teradata Identity Columns

By Roland Wenzlofsky

December 30, 2019

identity column

What are the features of a Teradata identity column?

What is a Teradata Identity column?

The identity column feature allows generating unique numbers for each row inserted into a column. Identity columns can be used for transactional or bulk inserts.

Which data types can be used for Identity columns?

Integer Columns (INTEGER, BIGINT, etc), including columns of type DECIMAL without decimal places (DECIMAL(10,0)).

What are the advantages compared to other methods of generating numbers?

No error-prone generation of unique values on the application level is required. Using the identity columns as a primary index also ensures a perfect distribution of the rows across all AMPs.

Is it guaranteed that no duplicates are created?

Uniqueness is only guaranteed if identity columns are created with the options GENERATED ALWAYS and NO CYCLE. These two options will be presented later in this article.

Are identity columns supported by the load utilities (Fastload, Multiload, etc.)?

Yes, identity columns are supported for both transactional inserts and bulk loads.

Are the numbers always nicely ascending and without gaps?

No, because the way the numbers are generated for performance reasons does not allow this. If you need ascending numbers without gaps, ROW_NUMBER() is more suitable.

What is the syntax for creating a table with Identity Column?

Here is an example:
LastName VARCHAR(255),
FirstName VARCHAR(255)
) PRIMARY INDEX (CustomerId);

What is one of the main applications for identity columns?

Creating Surrogate Keys.

What happens in the case of NO CYCLE when MAXVALUE is reached?

This error is reported: “*** Failure 7545 Numbering for Identity Column Customer.CustomerId is over its limit.”

Where can I see the values of an identity column (current value, maximum value, minimum value, etc.)?

These values are stored in table DBC.IDCOL:
DatabaseId BYTE(4) NOT NULL,
AvailValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9' NOT NULL,
StartValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9' NOT NULL,
MinValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9' NOT NULL,
MaxValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9' NOT NULL,
Increment INTEGER FORMAT ‘–,—,—,–9' NOT NULL,

Here is a query that lists all values:

CAST(dbase.databasename AS VARCHAR(255)) DatabaseName,
CAST(dbc.tvm.tvmname AS VARCHAR(255)) TableName
dbc.idcol.databaseid = dbc.tvm.databaseid
AND dbc.idcol.tableid = dbc.tvm.tvmid
tvm.DatabaseId = dbase.DatabaseId;

Wat Are The Most Important Options Of The Teradata Identity Column?

  • GENERATED ALWAYS: Generate always a value, whether or not a value is passed by the query.
  • GENERATED BY DEFAULT: Generate only a value if the column value delivered in the SQL insert statement is NULL. Used to copy rows to a table that has an identity column.
  • START WITH: The first number to be used in the system-generated numeric sequence.
  • INCREMENT BY:  The step size by which each generated number gets incremented (one by default).
  • MINVALUE: The minimum value to which a generated number can decrement. If MINVALUE is not specified -2,147,483,647 is used (the minimum value for INTEGER).
  • MAXVALUE: The maximum value by which a generated number can increment.
  • CYCLE: Defines whether generated values can be reused when their minimum or maximum is reached.

The Teradata Identity Column And Performance

When initially bulk-loading into a table with an Identity Column, there is some overhead involved, as each VPROC has to get a range of numbers from DBC.IdCol, and add them to its local cache.

But this overhead is only happening initially, the later handling of numbers is done from the cache.

Only when a VPROC runs out of numbers, it will get another range of numbers. Creating a few thousand identity column values takes a few seconds.

Bulk insert performance can be optimized, by increasing the DBSControl setting IdColBatchSize, which decreases the number of times the DBC.IdCol table has to be accessed to get a new range of numbers.

How are Identity Columns Generated?

How the values of the Identity Column are generated depends on how the rows are inserted into a table.

  • INSERT…SELECT: The values are cached on the AMPs
  • Single Row or USING clause: The values are cached on the Parsing Engine (PE)

As VPROCs (AMPs or PEs) are working in parallel, and independently from each other, the numbers generated and inserted into a table are not in chronological order.

There can be gaps between the generated numbers. However, this will increase the likely-hood of having larger gaps in the figures in the case of a system restart, and between loads, when not all numbers are consumed.

Limitations Of Identity Columns

  • Each table can only have one identity column

    *** Failure 3706 Syntax error: A table may not have more than one identity column.

  • ALTER TABLE can't add an identity column

    *** ALTER TABLE Failed. 3706: Syntax error: Cannot add new Identity Column option
  • Can't be part of a composite primary or secondary index

    *** Failure 5784 Illegal usage of Identity Column CustomerId.
  • Can't be defined on Join Index, Hash Index, PPI Table

    *** Failure 5784 Illegal usage of Identity Column CustomerId.
  • Can't be defined on secondary value ordered index bigger than 4 Bytes:

    *** Failure 5466 Error in Secondary Index DDL, Order by column is non-numeric or is more than 4 bytes.
  • Can't be defined in Global Temporary Tables or Volatile Tables:

    *** CREATE TABLE Failed.  [5784] Illegal usage of Identity Column CustomerId
  • Identity Columns with GENERATED ALWAYS can't be updated

    *** UPDATE Failed.  [5776] The GENERATED ALWAYS Identity Column CustomerId may not be updated.
  • Can't be defined on non-partitioned NOPI tables:

    *** Failure 3706 Syntax error: Identity Column in a NOPI table is unsupported.

Here you can read the article about surrogate keys, for whose creation IDENTITY columns are often used:

Roland Wenzlofsky

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

You might also like

  • It is useful to point out that tables with Generated Always Identity Columns can’t be used when copying a table with the created table as syntax. Also, any attempt to insert a value for the GA ID column will cause an error. Identity column tables can be problematic for some scenarios of data replication.

    • Absolutely agree with @MBrayshaw.

      Identity columns add maintenance and replication pain. It is harder to restore the table, harder to make a Create Table As Select, harder to replicate table. Generated values are not dense, and do not enforce order inside a batch.

      I think that Identity columns may be one of the weakest parts of Teradata and according to my experience, I’d recommend do not use them if possible.

      It is usually better to generate a new ID for a surrogate key by the ETL process using its possibilities. As an alternative sometimes it is better to calculate a new ID for small tables by using ROW_NUMBER() analytical function.

      Nevertheless, Teradata has no other opportunities to generate ID(except Identity columns and manual ROW_NUMBER() generation), so you have to use them if you are restricted by only Teradata’s facilities.

      Best regards,

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    Never miss a good story!

     Subscribe to our newsletter to keep up with the latest trends!