Data access on Teradata can be classified by the number of involved AMPs, and the method Teradata is using to search the rows inside a data block.
In order to avoid any confusion: If we only need to retrieve one or a few rows, the best method is to use a single AMP. If we need to retrieve a huge amount of rows, all-AMP access is usually the best method.
We will show you some advanced applications for single-AMP access in our article about Tactical Workload Tuning.
In this article, we will discuss possible combinations of the number of AMPs and search algorithm inside the data blocks and when each of these combinations is used by Teradata.
Single AMP – Binary Search
The most important representative for a Single-AMP Binary search is the Primary Index.
A Primary Index query requires just a single AMP to retrieve the qualifying row from a single or multiple data blocks. Rows are located with a binary search.
If the Primary Index is unique, no spool file is created. Furthermore, Primary Index access allows for row hash locking, the most granular locking method available. Rowhash locking allows concurrent requests to access one table at the same time, increasing the level of parallelism.
There is no cheaper way to locate rows on a disk than the Primary Index access.
Primary Index queries require that the Rowhash of the qualifying rows was calculated from the same columns like the Rowhash of the Primary Index of the accessed table.
If a WHERE clause contains all Primary Index columns, the Row Hash is calculated by passing the column values into the hash function.
The resulting Row Hash is used to retrieve the rows from their responsible AMPs. This is always a single-AMP operation as all rows with the same Row Hash are located on the same AMP.
Here is an example:
Let’s assume that The Primary Index of the Customer Table is First name plus Lastname and we want to retrieve information about “Marianela Garcia”.
“SELECT * FROM Customer WHERE Firstname = ‘Marianela’ and Lastname = ‘Garcia’”
Because the WHERE clause contains all Primary Index columns, the Row Hash is calculated and the row with information about “Marianela Garcia” is retrieved from the owning AMP.
On the other hand, if just one Primary Index column is missing in the WHERE clause, qualifying rows will be spread across different AMPs. The Parsing Engine has to instruct all AMPs to search for the matching rows.
“SELECT * FROM Customer WHERE Firstname = ‘Marianela’”
Finally, the WHERE clause can be composed of all Primary Index columns and additional columns.
The Parsing Engine can identify the responsible AMP by calculating the Row Hash over the Primary Index and apply filtering for the non-Primary Index columns of the WHERE clause.
“SELECT * FROM Customer WHERE Firstname = ‘Marianela’ AND Lastname = ‘Garcia’ AND Age = 28”
Up to 64 columns can be defined as the Primary Index, but be aware that the more columns we use, the less useful it will become, as Primary Index access will require that all columns are used in the WHERE clause.
Here is an example how an increasing number of Primary Index columns reduces the usability:
Let’s assume we create two identical tables but with different Primary Indexes.
The Primary Index of the first table is column A. The Primary Index of the second table is a combination of the columns A, B, and C.
(1) PRIMARY INDEX (A);
(2) PRIMARY INDEX (A,B,C);
The Primary Index of example 2 allows single-AMP access only for the third WHERE clause, while the Primary Index of Example 1 can be used for single-AMP access of all three WHERE clauses.
WHERE A=1 AND B=1
WHERE A=1 AND B=2 AND C=10
All of the characteristics we described before also apply for the Unique Secondary Index (USI).
All AMP – Sequential Search (Full Table Scan)
The full table scan means that all AMPs have to read all data blocks with a sequential search to locate qualifying rows.
The full table scan touches each row only once and provides access for any combination of columns in the WHERE clause. It’s a good choice if all rows need to be accessed. It reads each data block and usually requires a spool table as large as the base table.
A full table scan is mandatory if the RowHash of the WHERE clause column values doesn't match the Primary Index columns of the searched table and no other access path, such as a secondary index, is available.
Although the full table scan needs to access each data block belonging to the searched table, it's not necessarily a bad choice, as all AMPs are starting and finishing their work at the same time, ensuring maximum parallelism.
Full table scans can be done on base tables, non-unique secondary indexes, and join indexes.
All AMP – Binary Search
Teradata uses the All-AMP Binary search for the nonunique secondary index (NUSI).
Because Teradata does not distribute the NUSI index rows by RowHash but keeps them together with the related base table row, all AMPs have to be involved in the search for qualifying rows.
Nevertheless, the NUSI index rows are sorted by RowHash, and a binary search can be applied to locate the rows.