Tuning on Teradata is all about minimizing costs and maximizing parallelism.
The crucial costs (and the ones we can influence) in SQL tuning are I/Os and CPU seconds. The best way to understand their meaning is to examine at first the basics of a single computer.
Single computers are built from storage devices, RAM-based memory, and central processing units.
The CPU can process data only if it is available in memory.
Disks are the workhorse storage devices that can hold enormous amounts of data, on the order of hundreds to thousands of terabytes instead of the thousands of megabytes in a RAM-based memory.
Teradata stores table rows in data blocks. These blocks can dynamically grow when new rows are added and can be reduced if rows are removed or updated.
The minimum data block size is 512 Bytes, and they can become as large as one Megabyte.
Even if we read only a single table row, a whole data block must be copied from disk to memory. Therefore, every time we execute an SQL SELECT statement, one or more data blocks must be copied from disks to memory (if not already there).
The cost of a single data block moved between disk and memory is called an I/O. Data block transfers are a costly process because the access to the disks is relatively slow.
Regarding the completeness of this statement, it should be noted that not each read access causes necessarily I/Os. Teradata uses a caching mechanism, the Teradata Intelligent Memory, which keeps the most frequently used data blocks in a dedicated memory area.
If the required data blocks are available in the cache, disk access is omitted. Blocks read from the cache are called logical I/Os, while data blocks read from disk are called physical I/Os.
Typical I/O costs patterns for SQL statements are.
For the SQL SELECT:
Copy data blocks containing required rows from disk to memory
For the SQL UPDATE:
– Copy data blocks containing required rows from disk to memory
– Apply changes in memory
– Write changed data blocks back to disk
For the SQL INSERT:
– Copy data blocks which will hold the new rows from disk to memory
– Assemble the rows in memory
– Write data blocks back to disk
For the SQL DELETE:
– Copy data blocks which contain the rows to be deleted from disk to memory
– Delete the rows in memory
– Write data blocks back to disk
Additional I/O overhead might be caused by growing and shrinking data blocks dynamically in order to adjust them to the number of rows being stored.
Joins may require relocation and sorting of data. Both processes can cause a vast number of I/Os. We will talk about join costs very detailed in a later chapter.
Minimizing data block transfers is one of our significant optimizations goals. This goal can be achieved by reducing the number of data block transfers (I/Os) or reducing the transfer frequency.
From one point of view, Teradata works exactly like a single computer, as it is copying data between disk and memory.
What Teradata distinguishes from a PC is the Shared Nothing Architecture, consisting of many parallel units.
The Shared Nothing Architecture
– Slices a big task vertically into smaller tasks and
– Those tasks are performed independently and simultaneously in parallel
We can imagine each parallel unit as being an independent, single computer.
Parallelism on Teradata is based on three components: The AMPs (or access module processors), the PEs (or Parsing Engines), and the BYNET, a network connecting Parsing Engines and AMPs and AMPs among each other.
Together these components are handling each SQL request.
The Parsing Engine is the entry point into the database. As users log on to the database, they establish a Teradata session. Within each of these sessions, users submit SQL requests.
The parsing decomposes SQL requests into basic instructions for the AMPs. A high-level view is the Execution Plan we talked about earlier-
The Parsing Engine distributes processing tasks across all the AMPs. When an AMP finishes its work, it will pass the result rows via the BYNET network back to the Parsing Engine, which in turn transfers the rows to the client tool. The client tool could be, for example, your SQL Assistant or BTEQ.
The AMPs read and write data blocks; they do the data processing and are the database’s workhorses. Parallelism in Teradata is implemented by distributing the rows of each table across all AMPs.
– AMPs are responsible for storing and retrieving rows
– AMPs sort rows and do aggregations
– AMPs handle all the join processing
Again, you can see that these tasks are tightly coupled to the steps of the Execution Plan.
The AMPs are designed to work in parallel to complete the request in the shortest possible time. Optimally, every AMP should contain a subset of all the rows of each table. By dividing up the data, it automatically divides up the work.
In SQL tuning, the desired goal is that our query’s workload is evenly distributed across all parallel units so that all AMPs return their rows at the same time back to the Parsing Engine. If only one AMP is slower, it will cause a bottleneck, and the whole system performance will suffer.
Each AMP has its dedicated memory and its own virtual disk. Each AMP can do its work independently from any other AMP. This is, by the way, one of the main characteristics of a Shared Nothing Architecture.
A virtual disk is a group of disk cylinders assigned to each AMP. Not all cylinders of an AMPs virtual disk have to be located on the same physical device. No AMP can access the memory or virtual disk of another AMP.
Maximizing parallelism is another primary optimization goals. This goal can be achieved by ensuring an even distribution of rows across all AMPs during all steps of executing an SQL statement.