No Primary Index Tables on Teradata – NoPi
What are No Primary Index tables?
Starting with Teradata 13, tables can be created without a primary index.
However, this option does not replace the strategy of data distribution using hashing algorithms but represents an additional option for very specific cases.
So the question is: What are tables without a primary index used for and when can they replace primary index tables?
Since no hashing algorithm is used for tables without a primary index, the rows of a table are distributed randomly and evenly across all AMPs. However, rows must still be clearly identifiable.
Therefore, the RowId concept is also used for No Primary Index Tables (NOPI). However, the RowIds are generated differently from Primary Index tables.
The RowId at NOPI tables is composed of the hash bucket of the responsible AMP and a Uniqueness Value.
The bytes that are normally occupied by the hash value are used to increase the range for generating “Uniqueness” values.
So the question is: What do tables without a primary index mean, how are they implemented and how do they fit into the hashing design of Teradata?
The rows of a No Primary Index (NOPI) table are distributed randomly among all AMPs. There is no hashing, but rows must be uniquely identifiable. Therefore, the ROWID is generated differently than the ROWID of a primary index table:
Teradata uses the HASHBUCKET of the responsible AMP and adds a uniqueness value. The bytes that are normally occupied by the hash value are used to increase the range for generating “Uniqueness” values.
The following is the syntax for creating a No Primary Index Table:
CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX;
If neither PRIMARY INDEX nor NO PRIMARY INDEX is specified in the DDL, the table created can be a table without a primary index (NO PRIMARY INDEX, PrimaryIndexDefault=P, D) or a primary index table, depending on the DBS control field “PrimaryIndexDefault”.
In this case, the first NOT NULL column of the table definition is used as PRIMARY INDEX (if all columns are NOT NULL, the first NULLable column is used as PRIMARY INDEX (PrimaryIndexDefault=N).
How No Primary Index Tables work
The rows of NOPI Tables are distributed randomly and evenly across all AMPs. Bulk loading (e.g. fastloads, TPUMP array inserts) is faster for NOPI tables than for primary index tables.
However, the Fastload has additional performance advantages when loading a NOPI table. First, we look at the phases of the fast load for a primary index table:
- incoming rows are randomly distributed to all AMPs.
- The lines are hashed through the primary index value and forwarded to the responsible AMPs.
- The responsible AMPs sort the received lines by ROWID
The fast load for NOPI tables ends after phase 1 as soon as the rows have been randomly distributed to the AMPs the load is finished.
There is no hashing and no redistribution of rows. Sorting is also not necessary.
Since the rows are randomly assigned to the AMPs, the rows are always evenly distributed across all AMPs, and no skewing occurs.
Another advantage of NOPI tables is that the rows are appended to the end of the data blocks of the respective table.
This avoids any overhead that is normally caused by sorting the rows by row hash into the data blocks.
For example, if the INSERT…SELECT statement contains massive amounts of rows, the number of IOs is significantly reduced.
No Primary Index Tables and Skewing
NOPI tables will not become skewed, with a few exceptions:
- Bulk loaded small tables will always be skewed. Round robin happens on block level and not the row level. Some AMPs will receive data blocks, and the rest will receive no data.
- If you execute an INSERT…SELECT statement, from a primary index table into an NOPI table, AMP-local copying of the rows will be applied. All rows stay on their current AMP, and if the primary index table is skewed you, will end up with a skewed NOPI table.
- Skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.
The Usage of No Primary Index Tables
NOPI tables are useful in certain situations, but without a primary index, row access is limited to all-AMP full table scans.
To compensate for this disadvantage, a unique secondary index (USI) can be added to a NOPI table.
There are standard features used by NOPI tables and the Teradata Columnar implementation, introduced with Teradata 14: Column partitioned tables are always NOPI tables, and although offering great performance improvements for certain workload types, they lack as well the fast primary index access.
To some extent, the mentioned disadvantage of no primary index tables can be compensated with join indexes or secondary indexes.
No Primary Index tables should not be used as production tables, mainly consider using them during the ETL-Process, in situations when anyway a full table scan is required.
NOPI tables come with some further restrictions, find the most important ones:
– Only MULTISET tables can be created
– No identity columns can be used
– NoPi tables cannot be partitioned with a PPI
– No statements with an update character allowed (UPDATE, MERGE INTO, aUPSERT), still you can use INSERT, DELETE and SELECT
– No Permanent Journal possible
– Cannot be defined as Queue Tables
– No Queue Tables allowed
– Update Triggers cannot update an NOPI table (probably introduces with a later release)
– No hash indexes are allowed (use join indexes instead)
The following features can be used as usual together with NOPI tables:
The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables. The tablekind is ‘O’:
SELECT * FROM DBC.TABLES WHERE TABLEKIND = ‘O’;