Teradata SQL Tuning Goals

Roland Wenzlofsky

September 3, 2017

minutes reading time

Tuning on Teradata is all about minimizing costs and maximizing parallelism.

The crucial costs (and the ones we can influence) in SQL tuning are I/Os and CPU seconds. The best way to understand their meaning is to examine a single computer’s basics.

Single computers are built from storage devices, RAM-based memory, and central processing units.

The CPU can process data only if it is available in memory.

Disks are the workhorse storage devices that can hold enormous amounts of data, hundreds to thousands of terabytes instead of the thousands of megabytes in a RAM-based memory.

Teradata stores table rows in data blocks. These blocks can dynamically grow when new rows are added and can be reduced if rows are removed or updated.

The minimum data block size is 512 Bytes, which can become as large as one Megabyte.

Even if we read only a single table row, a whole data block must be copied from disk to memory. Therefore, every time we execute an SQL SELECT statement, one or more data blocks must be copied from disks to memory (if not already there).

The cost of a single data block moved between disk and memory is called an I/O. Data block transfers are costly because access to the disks is relatively slow.

Regarding the completeness of this statement, it should be noted that not each read access causes necessarily I/Os. Teradata uses a caching mechanism, the Teradata Intelligent Memory, which keeps the most frequently used data blocks in a dedicated memory area.

Disk access is omitted if the required data blocks are available in the cache. Blocks read from the cache are called logical I/Os, while data blocks read from the disk are called physical I/Os.

Typical I/O costs patterns for SQL statements are.


Copy data blocks containing required rows from disk to memory


– Copy data blocks containing required rows from disk to memory
– Apply changes in memory
– Write changed data blocks back to the disk


– Copy data blocks that will hold the new rows from disk to memory
– Assemble the rows in memory
– Write data blocks back to the disk


– Copy data blocks that contain the rows to be deleted from the disk to memory
– Delete the rows in memory
– Write data blocks back to the disk

Additional I/O overhead might be caused by dynamically growing and shrinking data blocks to adjust them to the number of stored rows.

For Joins:

Joins may require relocation and sorting of data. Both processes can cause a vast number of I/Os. We will talk about join costs in very detail in a later chapter.

Minimizing data block transfers is one of our essential optimization goals. This goal can be achieved by reducing the number of data block transfers (I/Os) or the transfer frequency.

From one point of view, Teradata works exactly like a single computer, as it copies data between disk and memory.

Teradata distinguishes the Shared Nothing Architecture from a PC, consisting of many parallel units.

The Shared Nothing Architecture

– Slices a big task vertically into smaller tasks and
– Those tasks are performed independently and simultaneously in parallel

We can imagine each parallel unit as being an independent, single computer.

Parallelism on Teradata is based on three components: The AMPs (or access module processors), the PEs (or Parsing Engines), and the BYNET, a network connecting Parsing Engines and AMPs and AMPs among each other.

Together these components handle each SQL request.

The Parsing Engine is the entry point into the database. As users log on to the database, they establish a Teradata session. Within each of these sessions, users submit SQL requests.

The parsing decomposes SQL requests into basic instructions for the AMPs. A high-level view is the Execution Plan we talked about earlier-

The Parsing Engine distributes processing tasks across all the AMPs. When an AMP finishes its work, it will pass the result rows via the BYNET network back to the Parsing Engine, which transfers the rows to the client tool. The client tool could be, for example, your SQL Assistant or BTEQ.

The AMPs read and write data blocks; they process data and are the database’s workhorses. Parallelism in Teradata is implemented by distributing the rows of each table across all AMPs.

– AMPs are responsible for storing and retrieving rows
– AMPs sort rows and do aggregations
– AMPs handle all the join processing

Again, you can see that these tasks are tightly coupled to the steps of the Execution Plan.

The AMPs are designed to work in parallel to complete the request in the shortest possible time. Optimally, every AMP should contain a subset of each table’s rows. By dividing up the data, it automatically divides up the work.

In SQL tuning, the desired goal is that our query’s workload is evenly distributed across all parallel units so that all AMPs return their rows to the Parsing Engine simultaneously. If only one AMP is slower, it will cause a bottleneck, and the whole system performance will suffer.

Each AMP has its dedicated memory and its virtual disk. Each AMP can do its work independently from any other AMP. This is, by the way, one of the main characteristics of a Shared Nothing Architecture.

A virtual disk is a group of disk cylinders assigned to each AMP. Not all cylinders of an AMP’s virtual disk have to be located on the same physical device. No AMP can access the memory or virtual disk of another AMP.


Maximizing parallelism is another primary optimization goal. This goal can be achieved by ensuring an even distribution of rows across all AMPs during all steps of executing an SQL statement.

The Primary Index Choice
First, the PDM, then Compression – Measures to reduce Resource Usage

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like