Teradata Identity Columns

4
2011

Teradata Identity ColumnWhat are Teradata Identity Columns?

The identity column feature allows generating unique numbers for each row inserted into a column. Identity columns can be uses for transactional or bulk inserts.

By using Identity Columns, we can reduce costs because column uniqueness is ensured without the overhead caused when using a unique constraint.

Furthermore, no error-prone generation of unique values on application level is required.

If the Identity Column is used as the Primary Index of a table, it ensures an even distribution of rows across all AMPs.

When initially bulk-loading into a table with an Identity Column, there is some overhead involved, as each vproc has to reserve a range of numbers from DBC.IdCol, and add them to its local cache. But this overhead is only happening initially, as the later handling of numbers is done from the cache.

Only when a vproc runs out of numbers, it will get another range of numbers.

As vprocs are working in parallel, and independently from each other, the numbers generated and inserted into a table are not in chronological order. There can be gaps.

Bulk insert performance can be optimized, by increasing the DBSControl setting IdColBatchSize, which decreases the number of timed the DBC.IdCol tables have to be accessed to get a new range of numbers.

Still, this will increase the likely-hood of having larger gaps in the figures in the case of a system restart, and between loads, when not all numbers are consumed.

The default batch size for number ranges is 100.000.

Useful Identity Column Options

  • GENERATED ALWAYS: Generate always a value, whether or not a value has been specified in the input (i.e. the value is not NULL)
  • GENERATED BY DEFAULT: Generate only an identity value if the column value delivered is NULL
  • START WITH: The smallest number in the system-generated numeric sequence.
  • INCREMENT BY:  The step size by which each generated number gets incremented
  • MINVALUE: The minimum value to which a generated number can decrement
  • MAXVALUE: The maximum value by which a generated number can increment
  • CYCLE: Defines whether generated values can be reused when their minimum or maximum is reached.

 

Our Reader Score
[Total: 6    Average: 4.3/5]
Teradata Identity Columns written by Roland Wenzlofsky on March 2, 2017 average rating 4.3/5 - 6 user ratings

4 COMMENTS

  1. It is useful to point out that tables with Generated Always Identity Columns can’t be used when copying a table with created table as syntax. Also any attempt to insert a value for the GA ID column will cause an error. Identity column tables can be problematic for some scenarios of data replication.

    • Absolutely agree with @MBrayshaw.

      Identity columns add maintenance and replication pain. It is harder to restore table, harder to make a Create Table As Select, harder to replicate table. Generated values are not dense, and do not enforce an order inside a batch.

      I think that Identity columns may be one of the weakest part of Teradata and according my experience I’d recommend do not use them if possible.

      It is usually better to generate a new ID for surrogate key by ETL process using it’s possibilities. As an alternative sometimes it is better to calculate a new ID for small tables by using ROW_NUMBER() analytical function.

      Nevertheless Teradata has no other opportunities to generate ID(except Identity columns and manual ROW_NUMBER() generation), so you have to use them if you are restricted by only Teradata’s facilities.

      Best regards,
      Aleksei.

LEAVE A REPLY

Please enter your comment!
Please enter your name here