What are No Primary Index tables?
Starting with Teradata Release 13, tables don’t require a Primary Index.
Nevertheless, the primary index still is the main idea to meet an even data distribution on a Teradata system. By design, the primary index ensures linear scalability.
Hence, the question is: What is the meaning of tables without a primary index, how are they implemented, and how do they fit into the hashing design of Teradata?
The rows of a No Primary Index (NOPI) table are distributed randomly across all AMPs. No hashing takes place, but rows have to be uniquely identifiable. Therefore the ROWID is generated differently than the ROWID of a primary index table:
Teradata uses the HASHBUCKET of the responsible AMP and adds a uniqueness value. The bytes usually occupied by the hash value are used to increase the range for generating “uniqueness” values.
Below is the syntax to create a No Primary Index Table:
CREATE TABLE <TABLE> (
PK INTEGER NOT NULL
) NO PRIMARY INDEX;
If neither PRIMARY INDEX nor NO PRIMARY INDEX is defined, depending on the DBS control field PrimaryIndexDefault, the result may be a table without primary index (NO PRIMARY INDEX, PrimaryIndexDefault=P, D); or the first not nullable column of the table definition will be used as the PRIMARY INDEX (if all columns are nullable, the first nullable column of the CREATE TABLE statement is used as the PRIMARY INDEX (PrimaryIndexDefault=N).
How No Primary Index Tables work
No Primary Index tables are distributed randomly across the AMPs. Therefore bulk loading of NOPI tables (Fastloads, TPUMP array insert loads) is faster than loading tables with a Primary Index.
There is a definite performance impact in FastLoads:
1. Incoming rows are distributed in a round-robin fashion randomly across all AMPs
2. The rows are hashed by the primary index value and forwarded to the responsible AMPs
3. The responsible AMPs sort the received rows by ROWID
Let us consider a No Primary Index table: After distributing the rows randomly across the AMPs, it’s finished.
No hashing and redistribution take place. No sorting is needed.
Furthermore, as the rows are assigned randomly to the AMPs, our data will always be distributed evenly across all AMPs, and no skewing happens.
All activities described above speed up the load process, as only the acquisition phase of the loading utilities is executed.
Another advantage of NOPI tables is that the rows will be appended to the end of the table’s data blocks. Thus any overhead usually caused by sorting the rows by row hash into the data blocks is avoided. For example, in case you INSERT…SELECT massive amounts of rows into your NOPI table, above described effect, will cut the number of IOs significantly.
No Primary Index Tables and Skewing
NOPI tables will not become skewed, with a few exceptions:
- Bulk loaded small tables will always be skewed. Round robin happens on block level and not the row level. Some AMPs will receive data blocks, and the rest will receive no data.
- If you execute an INSERT…SELECT statement, from a primary index table into an NOPI table, AMP-local copying of the rows will be applied. All rows stay on their current AMP, and if the primary index table is skewed you, will end up with a skewed NOPI table.
- Skewing can happen, by executing an archive job on a source system, and a restore of the backup on a different target system.
The Usage of No Primary Index Tables
NOPI tables are useful in certain situations, but without a Primary Index, row access is limited to All-AMP full tables scans. To offset this impact, you could add a unique secondary index (USI) to your NOPI table.
There are standard features used by NOPI tables and the Teradata Columnar implementation, introduced with Teradata 14: Column partitioned tables are always NOPI tables, and although offering great performance improvements for certain workload types, they lack as well the fast primary index access.
To some extent, the mentioned disadvantage of no primary index tables can be compensated with join indexes or secondary indexes.
No Primary Index tables should not be used as production tables, mainly consider using them during the ETL-Process, in situations when anyway a full table scan is required.
NOPI tables come with some further restrictions, find the most important ones:
– Only MULTISET tables can be created
– No identity columns can be used
– NoPi tables cannot be partitioned with a PPI
– No statements with an update character allowed (UPDATE, MERGE INTO, aUPSERT), still you can use INSERT, DELETE and SELECT
– No Permanent Journal possible
– Cannot be defined as Queue Tables
– No Queue Tables allowed
– Update Triggers cannot update an NOPI table (probably introduces with a later release)
– No hash indexes are allowed (use join indexes instead)
The following features can be used as usual together with NOPI tables:
– Fallback protection of the table
– Secondary Indexes (USI, NUSI)
– Join Indexes
– CHECK and UNIQUE constraints
– Collection of statistics
The information about NOPI tables, as in the case of any table type, is stored in DBC.Tables. The tablekind is ‘O’:
SELECT * FROM DBC.TABLES WHERE TABLEKIND = ‘O’;