fbpx

Understanding Teradata NOPI Tables

By Roland Wenzlofsky

September 8, 2020


The Essential Facts About NOPI Tables in Teradata

What are Teradata NOPI tables?

The rows of a Teradata NOPI table are not distributed evenly across all AMPs using the hashing algorithm, as is the case with a primary index. They are not intended to replace primary index tables but are used in certain situations.

How are the rows of NOPI tables distributed across the AMPs?

The rows are distributed evenly but randomly across all AMPs. Rows can, therefore, not be found without the presence of a secondary index.

How are the NOPI Tables created?

This is the syntax:
CREATE TABLE Customer
(
PK BIGINT NOT NULL
) NO PRIMARY INDEX;

If a primary key or a UNIQUE constraint is defined when creating the NOPI table, Teradata creates a unique secondary index to ensure uniqueness.

How can Teradata mark the rows of a NOPI table if no hashing is used?

For a NOPI table, a full table scan is always necessary if there is no other index that can be used. The AMP on which the searched rows are located is not known.

To identify the rows within the data blocks of the AMP, a ROWID is needed. While for PI tables, this is composed of the ROWHASH and a UNIQUENESS value, for NOPI tables, the hash bucket of the AMP and a UNIQUENESS value is used to form the ROWID.

Since NOPI tables are a design change that was only introduced in one of the newer releases of Teradata, it was decided to use the bytes, usually being the ROWHASH of PI tables, to extend the value range of the UNIQUENESS value for NOPI tables.

How does Teradata determine which AMP gets which NOPI row?

A random generator is used. This is available on the AMPs as well as on the parsing engines.

The AMP random generator is used for fast load and the random generator on the parsing engine for SQL requests.

During fast load, the random generator of the AMPs distributes the rows in the round-robin procedure.

For SQL requests, the DBQL QueryId of the request is used to generate a hash value which is used to distribute the rows,

What happens if I skip defining the Primary Index in the DDL statement of a CREATE TABLE?

Suppose you create a table and forget the primary index:

CREATE TABLE Customer
(
PK BIGINT NOT NULL
) ;

Which type of table is created depends on a system setting. If the DBC Control Field is set to “P” or “D” in the settings, a Primary Index Table is created.

In this case, the first NOT NULL column of the table definition is used as PRIMARY INDEX (if all columns are NOT NULL, the first NULLable column is used as PRIMARY INDEX).

If the DBC Control Field “PrimaryIndexDefault” is set to “N”, a NOPI table is created.

How does the use of a NOPI table affect the bulk load performance?

Bulk loading a NOPI table is more efficient than loading an identical PI table.
Fastload for PI tables is done in several stages.
Incoming rows (which are sent in blocks from the client) are first randomly distributed over the AMPs (round-robin). Afterward, the rowhash is calculated over the primary index columns, and the rows are sent to the right AMPs. On the target AMPs, the rows are collected, sorted, and written in data blocks to the disk.

For NOPI tables, the rows are received by random AMPs, but there is no need to redistribute and sort them. Furthermore, the blocks transferred from the client to Teradata are larger than the ones for PI tables.

Because the rows are evenly distributed over all the AMPs when a NOPI table is fast loaded, tables are rarely skewed (except small tables).

A further performance advantage results from the fact that new rows are appended at the end of the table/data blocks (the rows do not have to be sorted in ascending order by rowhash within the data blocks)

For an “INSERT…SELECT” statement of a huge number of rows: The number of I/Os is significantly lower for a NOPI table than for a PI table.

How do NOPI tables affect query performance?

Since no access path via hashing exists, all queries against a NOPI table require a full table scan. The aforementioned can be prevented by creating a unique secondary index (USI) or a join index.

teradata NOPI

Can NOPI tables never be skewed?

NOPI tables can be skewed in specific cases.

If a small table is loaded with a fast load, skew inevitably occurs because the data is not sent to the AMPs row by row but in blocks of 64KB or bigger. All rows may be sent to the same AMP and one data block.

When executing an “INSERT…SELECT” statement, from a primary index table into a NOPI table, AMP-local copying of the rows will be applied. All rows stay on their current AMP, and if the primary index table is skewed, the NOPI table will be skewed.

Skew can be avoided by using the HASH BY RANDOM option; data blocks are distributed randomly, which results in an even distribution.

INSERT INTO Customer SELECT * FROM Customer_2020
HASH BY RANDOM;

If you want to distribute on row level to achieve even better randomness, you can use this statement:

INSERT INTO Customer SELECT * FROM Customer_2020
HASH BY RANDOM (1,100000000);

Furthermore, skewing of a NOPI table can happen by executing an archive job on one system and restoring the backup on a different system.

teradata NOPIteradata NOPI

What are the use cases for NOPI tables?

NOPI tables are used as stage tables. This makes sense e.g., if the source files are loaded 1:1 with fast load and then a surrogate key is added to each row. Then a full table scan is required in any case, and a PI table would have no performance advantage.

A NOPI can also be useful if no non-skewed suitable primary index can be found or for quick testing of data without worrying about the primary index’s correct choice.

How can I find all the NOPI tables?

The Teradata tablekind for NOPI tables is ‘O’. With the following SQL statement, we can find all NOPI tables in our system:

SELECT * FROM DBC.TABLESV WHERE TABLEKIND = ‘O’;

Why can I read a NOPI table during loading?

When a NoPI table is loaded with FastLoad, users can read the table using an access lock. Dirty read access is allowed as rows are always appended to the end of a NoPI table. For tables with Primary Index, the redistribution and sorting of the rows prevent access during loading.

If I do not specify a primary index in the DDL statement, does Teradata create a NOPI table?

Depending on the system settings, if you don’t define a Primary Index, Teradata might make the first column a NUPI. If you want a NoPI table, you should always explicitly state NO PRIMARY INDEX.

Teradata NOPI Table Limitations

  • No SET tables allowed
  • No Identity columns allowed
  • No Row-Level Partitioning allowed
  • No UPDATE, MERGE INTO, or UPSERT allowed
  • No Permanent Journaling possible
  • Can’t be defined as Queue Tables
  • No UPDATE triggers allowed
  • No Hash Index allowed
  • No Multiload, as its load algorithm depends on the PI

NOPI in Teradata – What is allowed?

  • Fallback protection
  • Secondary Indexes (USI and NUSI)
  • Join Indexes
  • CHECK and UNIQUE constraints
  • Triggers
  • COLLECT STATISTICS
  • Global Temporary Tables
  • Volatile Tables
  • CLOB and BLOB Data Types
  • Fastload but duplicates are not filtered

Teradata NOPI – Loading Considerations

  • INSERT INTO Customer VALUES (100,’Roland’, ‘Wenzlofsky’);

    The parsing engine random generator selects a random AMP and appends the row to the end of the Teradata NOPI table.
  • INSERT INTO Customer SELECT * FROM Customer_2020;

    If Customer_2020 is a PI table, the rows will not be redistributed but stay on the same AMP. Skewing is possible!
  • Fastload

    Data blocks are received by the AMPs, which use the random generator to distribute the rows randomly and evenly.
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Avatar
    Peter Christensen says:

    Very nice article! This background knowledge is exactly what I have been looking for in order to understand how to optimize our TD loads. The Teradata documentation itself is certainly comprehensive but also a bit overwhelming, so blog posts like these are very valuable.

  • Avatar
    Shivkumar says:

    Please explain the below statement in detail

    skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.

  • Hi Murali. Sure I can give you one example from real life. I am using NOPI tables in the following scenario: We are staging data into Teradata and adding a surrogate key. In the next step, the tables are exported again and the transformation takes place in an ETL-Tool. The staging is done with fastloads. As the adding of a surrogate key column is touching anyway each table row, there is no point in using PI tables (it requires always a full table scan). Therefore the staging tables are NOPI, which reduces the resource usage of the fastloads (no sort step required).

    • This may come as naive. But Roland, can you please explain how surrogate key results in full table scan.
      I think a surrogate key is introduced to apply uniqueness which should not result in FTS.

      I could not understand this.
      “As the adding of a surrogate key column is touching anyway each table row, there is no point in using PI tables (it requires always a full table scan)”

      • Hi Niraj,

        I think this is a misunderstanding. What I wanted to say is that the creation of the surrogate keys touches each table row when inserting the surrogate keys based on the natural key of the table.

  • Thanks for covering no PI tables. Can you provide one example for using no PI table compare to the PI table?

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >