Teradata Identity Columns to achieve better Performance

Roland Wenzlofsky

September 15, 2020

minutes reading time


What are the Features of the Teradata Identity Columns?

What is a Teradata Identity column?

The Teradata identity columns feature generates unique numbers for each row inserted into a table. We use Identity columns for transactional and bulk inserts.

What are the key benefits of a Teradata Identity Column?

• It is easiest to ensure row uniqueness without the performance disadvantage of unique constraints.
• Unique identifiers can be generated within the database and ensure integrity. Creating unique identifiers on the application level would be much more complicated and error-prone.
• As a Primary Index, the Teradata Identity columns ensure even data distribution across the AMPs.
• The identity column can be used to create primary keys of entities.
• Teradata Identity columns simplify application logic and the maintenance tasks for database administrators.

Which data types can be used for Identity columns?

Integer Columns (INTEGER, BIGINT, etc.), including columns of type DECIMAL without decimal places (DECIMAL(10,0)).

Teradata Identity Columns to achieve better Performance 1

What are the advantages compared to other methods of generating numbers?

The main advantage is the error-free generation of unique values on the application level. Using the identity columns as a primary index also ensures a perfect distribution of the rows across all AMPs.

Is it guaranteed that Teradata does not create duplicate numbers?

Uniqueness is only guaranteed if we define the identity columns with the GENERATED ALWAYS and NO CYCLE options. We will present these two options later in this article.

Are identity columns supported by the load utilities (Fastload, Multiload, etc.)?

Yes, identity columns support both transactional inserts and bulk loads.

Are the generated numbers in ascending order and without gaps?

No, because the way the numbers are generated (for performance reasons) does not allow this. If you need ascending numbers without gaps, ROW_NUMBER() is more suitable.

What is the syntax for creating a table with an Identity Column?

Here is an example:
CREATE TABLE Customer
(
CustomerId BIGINT GENERATED ALWAYS AS IDENTITY
(START WITH 1 INCREMENT BY 1 MINVALUE 100 NO CYCLE)
,
LastName VARCHAR(255),
FirstName VARCHAR(255)
) PRIMARY INDEX (CustomerId);

What is one of the main applications for identity columns?

Creating Surrogate Keys.

What happens in the case of NO CYCLE when MAXVALUE is reached?

Teradata returns the following error: “*** Failure 7545 Numbering for Identity Column Customer.CustomerId is over its limit.”

Where can I see the value range of an identity column (current value, maximum value, minimum value, etc.)?

Teradata stores information about identity columns in table DBC.IDCOL:
CREATE SET TABLE DBC.IDCOL
(
TableId BYTE(6) NOT NULL,
DatabaseId BYTE(4) NOT NULL,
AvailValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
StartValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
MinValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
MaxValue DECIMAL(18,0) FORMAT ‘—-,—,—,—,—,–9’ NOT NULL,
Increment INTEGER FORMAT ‘–,—,—,–9’ NOT NULL,
Cyc CHAR(1) CHARACTER SET LATIN UPPERCASE NOT CASESPECIFIC NOT NULL)
UNIQUE PRIMARY INDEX ( TableId );

Below query shows the details of all identity columns:

SELECT
“StartValue”,
“MinValue”,
“MaxValue”,
“Increment”,
Cyc,
AvailValue,
CAST(dbase.databasename AS VARCHAR(255)) DatabaseName,
CAST(dbc.tvm.tvmname AS VARCHAR(255)) TableName
FROM
dbc.idcol
INNER JOIN
dbc.tvm
ON
dbc.idcol.databaseid = dbc.tvm.databaseid
AND dbc.idcol.tableid = dbc.tvm.tvmid
INNER JOIN
dbc.dbase
ON
tvm.DatabaseId = dbase.DatabaseId;

What Are The Most Important Options Of The Teradata Identity Column?

  • GENERATED ALWAYS: Generate always a value, whether or not the query passes a value.
  • GENERATED BY DEFAULT: Generate only a value if the column value delivered in the SQL insert statement is NULL. We can use this property to copy existing data to a table with an identity column.
  • START WITH: The first number used in the system-generated numeric sequence.
  • INCREMENT BY:  The step size by which each generated number gets incremented (one by default).
  • MINVALUE: The minimum value to which a generated number can decrement. If MINVALUE is not specified, -2,147,483,647 is used (the minimum value for INTEGER).
  • MAXVALUE: The maximum value by which a generated number can increment.
  • CYCLE: Defines whether Teradata is allowed to reuse already generated values when it reaches the defined minimum or maximum values for an identity column.

The Teradata Identity Column and Performance

When bulk-loading into a table with an Identity Column, some overhead is involved, as each VPROC has to get a range of numbers from DBC.IdCol, and add them to its local cache.

But this overhead is only happening initially. Later, Teradata retrieves the numbers handling of numbers from the cache.

Only when a VPROC runs out of numbers will it get another range of numbers. Creating a few thousand identity column values takes a few seconds.

We can optimize bulk insert performance by increasing the DBSControl setting IdColBatchSize, which decreases the number of times the DBC.IdCol table has to be accessed to get a new range of numbers.

The performance improvement results from the fact that there are fewer or no updates on the DBC.IdCol table.

Due to larger batches, we must consider that many more numbers are lost during loading (or during a system restart). We should define the optimal batch size depending on the number of AMPs.

How are Teradata Identity Columns Generated?

How the Teradata identity columns’ values are generated depends on how the rows are inserted into a table.

  • INSERT…SELECT: The values are cached on the AMPs
  • Single Row or USING clause: The values are cached on the Parsing Engine (PE)

As VPROCs (AMPs or PEs) work in parallel and independently from each other, the numbers generated and inserted into a table are not in chronological order.

There can be gaps between the generated numbers. However, the likelihood of having more significant gaps in the numbers increases with a system restart, between loads, and when not all numbers are consumed.

Limitations Of Teradata Identity Columns

  • Each table can only have one identity column.

    *** Failure 3706 Syntax error: A table may not have more than one identity column.
  • ALTER TABLE can’t add an identity column

    *** ALTER TABLE Failed. 3706: Syntax error: Cannot add new Identity Column option
  • Can’t be part of a composite primary or secondary index

    *** Failure 5784 Illegal usage of Identity Column CustomerId.
  • Can’t be defined on Join Index, Hash Index, PPI Table

    *** Failure 5784 Illegal usage of Identity Column CustomerId.
  • Can’t be defined on secondary value ordered index bigger than 4 Bytes:

    *** Failure 5466 Error in Secondary Index DDL, Order by column is non-numeric or is more than 4 bytes.
  • Can’t be defined in Global Temporary Tables or Volatile Tables:

    *** CREATE TABLE Failed. [5784] Illegal usage of Identity Column CustomerId
  • Identity Columns with GENERATED ALWAYS can’t be updated.

    *** UPDATE Failed. [5776] The GENERATED ALWAYS Identity Column CustomerId may not be updated.
  • Can’t be defined on non-partitioned NOPI tables:

    *** Failure 3706 Syntax error: The identity Column in a NOPI table is unsupported.

In this article from Teradata, you can read how we can use Identity columns to create surrogate keys:

https://docs.teradata.com/reader/scPHvjfglIlB8F70YliLAw/WdfF9oz3QtNau24q2XvXXA

  • Avatar
    Maheswara Rao says:

    Good Article. Informative. Thanks for posting.

  • It will be more helpful if a few examples are added along with the syntax of the same. Thanks!

  • Avatar
    Mark Brayshaw says:

    It is useful to point out that tables with Generated Always Identity Columns can’t be used when copying a table with the created table as syntax. Also, any attempt to insert a value for the GA ID column will cause an error. Identity column tables can be problematic for some scenarios of data replication.

    • Avatar
      Aleksei Svitin says:

      Absolutely agree with @MBrayshaw.

      Identity columns add maintenance and replication pain. It is harder to restore the table, harder to make a Create Table As Select, harder to replicate table. Generated values are not dense, and do not enforce order inside a batch.

      I think that Identity columns may be one of the weakest parts of Teradata and according to my experience, I’d recommend do not use them if possible.

      It is usually better to generate a new ID for a surrogate key by the ETL process using its possibilities. As an alternative sometimes it is better to calculate a new ID for small tables by using ROW_NUMBER() analytical function.

      Nevertheless, Teradata has no other opportunities to generate ID(except Identity columns and manual ROW_NUMBER() generation), so you have to use them if you are restricted by only Teradata’s facilities.

      Best regards,
      Aleksei.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >