Understanding Teradata NOPI Tables | DWHPRO
6

Understanding Teradata NOPI Tables

The Essential Facts About Teradata NOPI Tables

What are Teradata NOPI tables?

The rows of NOPI tables are not distributed evenly across all AMPs using the hashing algorithm as is the case with tables with a primary index. They are not intended to replace primary index tables but are used in certain situations.

How are the rows of NOPI tables distributed across the AMPs?

The rows are distributed evenly but randomly across all AMPs. Rows can therefore not be found without the presence of a secondary index.

How are the NOPI Tables created?

This is the syntax:
CREATE TABLE Customer
(
PK BIGINT NOT NULL
) NO PRIMARY INDEX;

If a primary key or a UNIQUE constraint is defined when creating the NOPI table, Teradata creates a unique secondary index to ensure uniqueness.

How can Teradata mark the rows of a NOPI table if no hashing is used?

For a NOPI table, a full table scan is always necessary if there is no other index that can be used. The AMP on which the searched rows are located is not known.

To identify the rows within the data blocks of the AMP a ROWID is needed. While for PI tables this is composed of the ROWHASH and a UNIQUENESS value, for NOPI tables the hash bucket of the AMP and a UNIQUENESS value is used to form the ROWID.

Since NOPI tables are a design change that was only introduced in one of the newer releases of Teradata, it was decided to use the bytes usually being the ROWHASH of PI tables to extend the value range of the UNIQUENESS value for NOPI tables.

How does Teradata determine which AMP gets which NOPI row?

A random generator is used. This is available on the AMPs as well as on the parsing engines.

The AMP random generator is used for fast load and the random generator on the parsing engine for SQL requests.

During fast load, the random generator of the AMPs distributes the rows in the round-robin procedure.

For SQL requests the DBQL QueryId of the request is used to generate a hash value which is used to distribute the rows,

What happens if I skip defining the Primary Index in the DDL statement of a CREATE TABLE?

Suppose you create a table and forget the primary index:

CREATE TABLE Customer
(
PK BIGINT NOT NULL
) ;

Which type of table is created depends on a system setting. If the DBC Control Field is set to “P” or “D” in the settings, a Primary Index Table is created.

In this case, the first NOT NULL column of the table definition is used as PRIMARY INDEX (if all columns are NOT NULL, the first NULLable column is used as PRIMARY INDEX).

If the DBC Control Field “PrimaryIndexDefault” is set to “N”, a NOPI table is created.

How does the use of a NOPI table affect the bulk load performance?

Loading a NOPI table, which is otherwise identical to a PI table, is more efficient.

Let's look at why: The Fastload is done in several phases.
Incoming rows (which are sent in blocks from the client) are first randomly distributed over the AMPs (round-robin) and then the rowhash is formed over the primary index columns and the rows are sent to the correct AMPs. There the rows are collected, sorted, and written in data blocks to the mass memory.

For NOPI tables the rows are only received. There is no need to redistribute the rows and no sorting is required.

Because the rows are evenly distributed over all the AMPs when a NOPI table is fast loaded, tables loaded this way are almost never skewed (exceptions for small tables see below).

A further performance advantage results from the simple fact that new rows are simply appended to the end of data blocks (the rows or pointers do not have to be sorted in ascending order by rowhash within the data blocks)

For an “INSERT…SELECT” statement of a huge amount of rows, the number of IOs is significantly lower for a NOPI table than for a PI table.

How do NOPI tables affect query performance?

Since no access via hashing is possible the query of NOPI tables requires a full table scan. This can be prevented by creating a unique secondary index or join index on the table.

teradata NOPI

Can NOPI tables never be skewed?

NOPI tables can also be skewed, but only in certain cases.

If a small table is loaded with a fast load, skew inevitably occurs because the data is not sent to the AMPs row by row but in blocks.
If the table is small, it is possible that all rows are sent in one block and end up on one single AMP.

When executing an “INSERT…SELECT” statement, from a primary index table into a NOPI table, AMP-local copying of the rows will be applied. All rows stay on their current AMP, and if the primary index table is skewed the NOPI table will be skewed as well.

This can be avoided by using the HASH BY RANDOM option. Data blocks are distributed randomly and give an even distribution.

INSERT INTO Customer SELECT * FROM Customer_2020
HASH BY RANDOM;

If you want to distribute on row level to achieve even better randomness you can use this statement:

INSERT INTO Customer SELECT * FROM Customer_2020
HASH BY RANDOM (1,100000000);


Furthermore, skewing of a NOPI table can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.

teradata NOPIteradata NOPI

What are the use cases for NOPI tables?


Remark: All column partitioned tables are always NOPI tables. This is a design limitation of Teradata.

NOPI tables are used as stage tables. This makes sense e.g. if the source files are loaded 1:1 with fast load and then a surrogate key is inserted on each row.

Then a full table scan is necessary in any case and a PI table would have no advantage.

A NOPI can also be useful if no non-skewed suitable primary index can be found or for quick testing of data without having to worry about the correct choice of the primary index.

How can I find all the NOPI tables in my database?

With the following SQL statement:
SELECT * FROM DBC.TABLES WHERE TABLEKIND = ‘O';

Teradata NOPI Table Limitations

  • No SET tables allowed
  • No Identity columns allowed
  • No Row-Level Partitioning allowed
  • No UPDATE, MERGE INTO, or UPSERT allowed
  • No Permanent Journaling possible
  • Can't be defined as Queue Tables
  • No UPDATE triggers allowed
  • No Hash Index allowed
  • No Multiload

Teradata NOPI Tables – What's allowed?

  • Fallback protection
  • Secondary Indexes (USI and NUSI)
  • Join Indexes
  • CHECK and UNIQUE constraints
  • Triggers
  • COLLECT STATISTICS
  • Global Temporary Tables
  • Volatile Tables
  • CLOB and BLOB Data Types
  • Fastload but duplicates are not filtered

Teradata NOPI – Loading Considerations

  • INSERT INTO Customer VALUES (100,'Roland', ‘Wenzlofsky');

    The parsing engine random generator selects a random AMP and appends the row to the end of the Teradata NOPI table
  • INSERT INTO Customer SELECT * FROM Customer_2020;

    If Customer_2020 is a PI table, the rows will not be redistributed but stay on the same AMP. Skewing is possible!
  • Fastload

    Data blocks are received by the AMPs which use the random generator to distribute the rows randomly and evenly.
Teradata Architecture Is So Famous, But Why?
What Everyone Should Now About The Teradata Design
Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Avatar Shivkumar says:

    Please explain the below statement in detail

    skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.

  • Hi Murali. Sure I can give you one example from real life. I am using NOPI tables in the following scenario: We are staging data into Teradata and adding a surrogate key. In the next step, the tables are exported again and the transformation takes place in an ETL-Tool. The staging is done with fastloads. As the adding of a surrogate key column is touching anyway each table row, there is no point in using PI tables (it requires always a full table scan). Therefore the staging tables are NOPI, which reduces the resource usage of the fastloads (no sort step required).

    • Avatar Murali says:

      Thanks for sharing this idea…

    • Avatar Niraj says:

      This may come as naive. But Roland, can you please explain how surrogate key results in full table scan.
      I think a surrogate key is introduced to apply uniqueness which should not result in FTS.

      I could not understand this.
      “As the adding of a surrogate key column is touching anyway each table row, there is no point in using PI tables (it requires always a full table scan)”

      • Hi Niraj,

        I think this is a misunderstanding. What I wanted to say is that the creation of the surrogate keys touches each table row when inserting the surrogate keys based on the natural key of the table.

  • Avatar Murali says:

    Thanks for covering no PI tables. Can you provide one example for using no PI table compare to the PI table?

  • >