Teradata data access is categorized based on the number of AMPs involved and the technique employed to search rows within a data block.
In our article about SQL Tuning, we mentioned that one of the main goals in performance optimization is to distribute the workload evenly across all AMPs.
It is best to avoid confusion using a single AMP when retrieving only a few rows. However, when retrieving many rows, all-AMP access is typically the optimal method.
Learn advanced applications for single-AMP access with our article on Tactical Workload Tuning.
This article will explore the various combinations of AMPs and search algorithms within data blocks and their specific applications within Teradata.
Single AMP – Binary Search
The Primary Index is the key representative for a Single-AMP Binary search.
A Primary Index query necessitates one AMP to fetch the eligible row/s from one or multiple data blocks via a binary search.
Unique Primary Indexes eliminate the creation of spool files. Additionally, accessing tables using the Primary Index enables row hash locking, which is the most precise locking mechanism. Row hash locking facilitates concurrent access to a single table, improving parallelism.
Primary Index access is the most cost-effective method to locate disk rows.
Primary Index queries necessitate calculating the Rowhash of qualifying rows from the identical columns as the Primary Index Rowhash of the accessed table.
The Row Hash is calculated by passing the column values into the hash function if the WHERE clause includes all Primary Index columns.
The Row Hash retrieves rows from their respective AMPs. It is a single-AMP operation since all rows with the same Row Hash are on the same AMP.
Here is an example:
Assuming the Customer Table’s Primary Index comprises the First and Last name fields, we aim to extract data pertaining to “Marianela Garcia”.
“SELECT * FROM Customer WHERE Firstname = ‘Marianela’ and Lastname = ‘Garcia’”
The row hash is calculated, and the information for “Marianela Garcia” is retrieved from the owning AMP due to the presence of all primary index columns in the WHERE clause.
If the WHERE clause is missing one Primary Index column, the qualifying rows will be scattered across multiple AMPs. The Parsing Engine must direct all AMPs to search for the corresponding rows.
“SELECT * FROM Customer WHERE Firstname = ‘Marianela’”
The WHERE clause can include all Primary Index and other columns.
The Parsing Engine calculates the Row Hash on the Primary Index to pinpoint the responsible AMP and filters non-Primary Index columns of the WHERE clause to identify the AMP.
“SELECT * FROM Customer WHERE Firstname = ‘Marianela’ AND Lastname = ‘Garcia’ AND Age = 28”
A maximum of 64 columns can be designated as the Primary Index. However, it is important to note that as more columns are included, the effectiveness of the Primary Index diminishes. This is because Primary Index access necessitates using all columns in the WHERE clause.
An expanding number of Primary Index columns diminishes usability.
Suppose we generate two identical tables, each with a distinct Primary Index.
Table one’s Primary Index is column A, while Table two’s Primary Index is a fusion of columns A, B, and C.
(1) PRIMARY INDEX (A);
(2) PRIMARY INDEX (A,B,C);
The Primary Index of Example 2 allows single-AMP access only for the third WHERE clause, while the Primary Index of Example 1 can be used for single-AMP access to all three WHERE clauses.
WHERE A=1 AND B=1
WHERE A=1 AND B=2 AND C=10
The Unique Secondary Index (USI) shares the same characteristics as previously described.
All AMP – Sequential Search (Full Table Scan)
A full table scan requires every AMP to perform a sequential search of all data blocks in order to find rows that meet the criteria.
A full table scan accesses each row once and allows access to any column combination in the WHERE clause. It is ideal when all rows require access. It reads every data block and typically necessitates a spool table of equivalent size to the base table.
A full table scan is necessary when the RowHash of the WHERE clause does not correspond to the Primary Index columns of the table being searched, and there are no other access paths, such as a secondary index, to utilize.
The full table scan, while requiring access to every data block of the table, can still be a favorable option. This is because all AMPs begin and end their tasks concurrently, resulting in optimal parallelism.
Base tables, non-unique secondary indexes, and join indexes can undergo full table scans.
All AMP – Binary Search
Teradata uses the All-AMP Binary search for the nonunique secondary index (NUSI).
Teradata keeps the NUSI index rows with the corresponding base table row rather than distributing them by RowHash. As a result, all AMPs must participate in the search for qualifying rows.
RowHash sorts NUSI index rows, enabling binary search for row location.