Teradata Surrogate Keys Guidelines

3
1868

Teradata Surrogate Keys

What are Surrogate Keys?

Surrogate keys are a concept used in data warehousing, to base the primary keys of a table on generated and unique artificial values (usually integer values) instead of using the columns from the source system to identify the rows in a table unambiguously (in database terminology this is called a natural key).

Why should you use Surrogate Keys?

There are several reasons why it is a good idea to replace the natural keys with Teradata surrogate keys.

At first, you may have more than one source system delivering the same value in the natural key for the same target table. This fact forces you to use surrogate keys because keeping the natural key from both sources would not allow you to distinguish the objects.

Likewise, you may have the opposite case:

You may have two source systems delivering the same object, but with different natural keys.

Imagine, for example, a company where potential clients are initially recorded in a database for possible customers, being moved to the production billing system, as soon as they signed a contract with this company.

Both operational systems may have their natural keys to identify the same client. Typically, in a data warehouse, you want to keep track of the complete customer life cycle. Therefore you need one key to identifying this client and don’t care about the operational system where this client was handled.

Unfortunately, many Teradata data warehouse implementations decide against surrogate keys and put the effort of integrating the information into the reports.

This approach will ultimately cause more costs than having a right surrogate key design from the begin (don’t forget, such an approach requires to repeat the logic of combining object information into each and every single report).

Even worse, if the natural key names are taken from the source system, and you undergo a replacement of the origin system. You never should tightly couple your data model to the system of origin structures. An invoice table should be called Invoice and not something like SAP-Invoice.

Initially it is always faster to take a pass on surrogate keys and integrate source system tables 1:1, but believe me, you just move the costs to a later stage in the project. Be prepared for a costly redesign.

Apart from making you independent from source system influences, surrogate keys, if properly used, may increase your performance as you can replace several natural key columns by an integer column, which can save space and decrease disk IOs.

 

Our Reader Score
[Total: 10    Average: 3.4/5]
Teradata Surrogate Keys Guidelines written by Roland Wenzlofsky on May 19, 2014 average rating 3.4/5 - 10 user ratings

3 COMMENTS

  1. Terdata and few td consultants have fooled organisations with this concept and complicated simple things. Simple surrogate key is to maintain slowly changing dimensions and not for integration of different sources as said in this post. If integration was purpose like example of customer lifecycle then its a separate topic of master data management, dont try to mix all master data , metadata(gcfr) and all things in one and complicate things. I have seen organisations sufferinf becoz of such design and majorly some pakistani consultants have been ruling these confusions and securing their jobs and sucking clients money.
    Honestly teradata is amazing product but design it right way and keep it simple

    • I give you an example, and would kindly ask you to tell me how you would design the model without surrogate keys: You have two billing systems, invoices have to be stored in the same entity (“Invoice”). Natural keys of both billing systems are overlapping. How would you design the “Invoice” table without surrogate keys?

LEAVE A REPLY

Please enter your comment!
Please enter your name here