What are No Primary Index tables?

Starting with Teradata Release 13, tables don’t require a Primary Index.

Nevertheless, the primary index still is the main idea to meet an even data distribution on a Teradata system. By design, the primary index ensures linear scalability.

Hence, the question is: What is the meaning of tables without a primary index, how are they implemented,  and how do they fit into the hashing design of Teradata?

The rows of a No Primary Index (NOPI) table are distributed randomly across all AMPs. No hashing takes place, but rows have to be uniquely identifiable. Therefore the ROWID is generated differently than the ROWID of a primary index table:

Teradata uses the HASHBUCKET of the responsible AMP and adds a uniqueness value. The bytes usually occupied by the hash value are used to increase the range for generating “uniqueness” values.

Below is the syntax to create a No Primary Index Table:

CREATE TABLE <TABLE> (
PK INTEGER NOT NULL

) NO PRIMARY INDEX;

If neither PRIMARY INDEX nor NO PRIMARY INDEX is defined, depending on the DBS control field PrimaryIndexDefault, the result may be a table without primary index (NO PRIMARY INDEX, PrimaryIndexDefault=P, D);  or the first not nullable column of the table definition will be used as the PRIMARY INDEX (if all columns are nullable, the first nullable column of the CREATE TABLE statement is used as the PRIMARY INDEX (PrimaryIndexDefault=N).

 

How No Primary Index Tables work

No Primary Index tables are distributed randomly across the AMPs. Therefore bulk loading of NOPI tables (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.

There is a definite performance impact in FastLoads:

1. Incoming rows are distributed in a round-robin fashion randomly across all AMPs
2. The rows are hashed by the primary index value and forwarded to the responsible AMPs
3. The responsible AMPs sort the received rows by ROWID

Let us consider a No Primary Index table:  After distributing the rows randomly across the AMPs, it’s finished.

No hashing and redistribution take place. No sorting is needed.

Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed evenly across all AMPs, and no skewing happens.

All activities described above speed up the load process, as only the acquisition phase of the loading utilities is executed.

Another advantage of NOPI tables is that the rows will be appended to the end of the table’s data blocks. Thus any overhead usually caused by sorting the rows by row hash into the data blocks is avoided. For example, in case you INSERT…SELECT  massive amounts of rows into your NOPI table,  above described effect, will cut the number of IOs significantly.

No Primary Index Tables and Skewing

NOPI tables will not become skewed, with a few exceptions:

  • Bulk loaded small tables will always be skewed. Round robin happens on block level and not the row level. Some AMPs will receive data blocks, and the rest will receive no data.
  • If you execute an INSERT…SELECT statement, from a primary index table into an NOPI table, AMP-local copying of the rows will be applied. All rows stay on their current AMP, and if the primary index table is skewed you, will end up with a skewed NOPI table.
  • Skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.

The Usage of No Primary Index Tables

NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans. To offset this impact, you could add a unique secondary index (USI) to your NOPI table.

There are standard features used by NOPI tables and the Teradata Columnar implementation, introduced with Teradata 14: Column partitioned tables are always NOPI tables, and although offering great performance improvements for certain workload types, they lack as well the fast primary index access.

To some extent, the mentioned disadvantage of no primary index tables can be compensated with join indexes or secondary indexes.

No Primary Index tables should not be used as production tables, mainly consider using them during the ETL-Process, in situations when anyway a full table scan is required.

NOPI tables come with some further restrictions, find the most important ones:

– Only MULTISET tables can be created
– No identity columns can be used
– NoPi tables cannot be partitioned with a PPI
– No statements with an update character allowed (UPDATE, MERGE INTO, aUPSERT), still you can use INSERT, DELETE and SELECT
– No Permanent Journal possible
– Cannot be defined as Queue Tables
– No Queue Tables allowed
– Update Triggers cannot update an NOPI table (probably introduces with a later release)
– No hash indexes are allowed (use join indexes instead)

The following features can be used as usual together with NOPI tables:

– Fallback protection of the table
– Secondary Indexes (USI, NUSI)
– Join Indexes
– CHECK and UNIQUE constraints
– Triggers
– Collection of statistics

The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables.  The tablekind is ‘O’:

SELECT * FROM DBC.TABLES WHERE TABLEKIND = ‘O’;

Our Reader Score
[Total: 22    Average: 4.4/5]
No Primary Index Tables on Teradata – NoPi written by Roland Wenzlofsky on October 18, 2014 average rating 4.4/5 - 22 user ratings

6 COMMENTS

  1. Please explain below statement in detail

    skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.

  2. Hi Murali. Sure I can give you one example from real life. I am using NOPI tables in the following scenario: We are staging data into Teradata and adding a surrogate key. In a next step, the tables are exported again and the transformation takes place in an ETL-Tool. The staging is done with fastloads. As the adding of a surrogate key column is touching anyway each table row, there is no point in using PI tables (it requires always a full table scan). Therefore the staging tables are NOPI, which reduces the resource usage of the fastloads (no sort step required).

    • This may come as naive. But Roland , can you please explain how surrogate key results in full table scan.
      I think surrogate key is introduced to apply uniqueness which should not result in FTS.

      I could not understand this .
      “As the adding of a surrogate key column is touching anyway each table row, there is no point in using PI tables (it requires always a full table scan)”

      • Hi Niraj,

        I think this is a misunderstanding. What I wanted to say is that the creation of the surrogate keys touches each table row when inserting the surrogate keys based on the natural key of the table.

LEAVE A REPLY

Please enter your comment!
Please enter your name here