A comprehensive comprehension of the Teradata system’s technical intricacies is crucial to attaining expertise in performance optimization. Although many of us are knowledgeable about existing optimization opportunities, resolving complex performance issues often requires an in-depth understanding of internal data storage.
Until recently, data storage relied solely on hard disks. However, the current trend is shifting towards solid-state disks, yet hard drives are expected to remain in use for a considerable time.
Teradata systems currently employ both solid-state disks and hard drives for data storage.
The Teradata data storage system comprises cylinders, sectors, and blocks within the sectors, which marks the start of our journey into its inner workings.
Data records are stored in blocks allocated to specific cylinders on the AMPs disk.
Teradata Physical Storage – How can a table row be accessed?
The Parsing Engine sends requests to the AMPs containing the internal TABLEID, which serves as a unique identifier for tables in a Teradata system. To locate a data record on the disk, two indexes are employed.
The Teradata Master Index
The Master Index stores information regarding the cylinder holding a table’s rows, eliminating the need to read all cylinders for a single data row, such as primary index access. The cylinder index contains each cylinder’s minimum ROWID and maximum ROWHASH values to prevent redundancy.
The master index maintains a sub-index of available cylinders for write operations such as inserts and updates.
The master index is cached in the FSG cache and readily accessible on every AMP for data retrieval.
The Teradata Cylinder Index
Upon querying the master index, the AMP swiftly located the table records cylinders. The cylinder index was then utilized to pinpoint the specific cylinder sectors that held the necessary data blocks, which ultimately housed the required data rows. Notably, the cylinder index stores the initial and final sectors for every data block in a table.
Due to the larger size of the cylinder index in comparison to the master index, it may not be completely stored within the FSG cache.
The lowest ROWID and highest ROWHASH are accessible for restricting access to sectors containing necessary records, similar to the master index. The cylinder index maintains a sub-index of all vacant disk sectors.
Once the AMP determines the first and final sector of the data block containing the necessary data record(s), it can be transferred to the FSG cache for performing the select, update, insert, or delete operations.
The Binary Search
The AMP will locate the requested data record(s) by performing a binary search on the data block(s).
The binary search algorithm efficiently locates a row in a sorted array of rowids by iteratively comparing the desired rowid value with the rowid value of the central pointer array element.
Pointers mark table rows. The row can be accessed by the position the pointer array’s central element points to if it matches the rowid being searched for. If the rowid is less than the central element’s rowid, the algorithm will search to the left of the middle element. On the other hand, if the search key is greater, it will search to the right. The algorithm will repeat these actions until the searched rowid is found or there are no more elements to search, indicating that the row is unavailable.
In part two of “The Ultimate Teradata Physical Storage Guide” series, we’ll examine the data block structure, record storage within the blocks, and Teradata’s management of varying block size needs.
We will demonstrate Teradata’s data block maintenance techniques, which involve adding, reducing, and dividing blocks as necessary to reduce fragmentation and optimize the system speed.