Teradata SQL Tuning: Minimizing Costs and Maximizing Parallelism

Roland Wenzlofsky

April 23, 2023

minutes reading time

Teradata tuning focuses on optimizing performance while minimizing expenses and maximizing parallel processing.

The primary expenses that can be controlled during SQL tuning are I/O and CPU usage. To grasp its significance, analyzing the fundamental components of a single computer is the optimal approach.

Computers consist of storage devices, RAM, and CPUs.

The CPU relies on data stored in memory for processing.

Disks are reliable storage devices that contain vast amounts of data, ranging from hundreds to thousands of terabytes. This starkly contrasts to RAM-based memory, which only holds thousands of megabytes.

Table rows in Teradata are stored in data blocks that can expand dynamically when new rows are added and contract when rows are removed or updated.

The minimum block size for data is 512 Bytes, but it can increase to one Megabyte.

Executing an SQL SELECT statement necessitates copying one or more data blocks from disks to memory, even if we only read a single table row.

The expense of moving a solitary data block between disk and memory is referred to as an I/O. Data block transfers are expensive due to the relatively sluggish access to disks.

It should be noted that not every read access results in I/Os. Teradata employs a caching mechanism called Teradata Intelligent Memory that stores the frequently accessed data blocks in a dedicated memory area. This ensures the completeness of the statement.

If the necessary data blocks are present in the cache, disk access is avoided. The blocks retrieved from the cache are known as logical input/output (I/O) operations, whereas the data blocks obtained from the disk are known as physical I/O operations.

Typical I/O cost patterns for SQL statements are.


Copy data blocks containing required rows from disk to memory


– Copy data blocks containing required rows from disk to memory
– Apply changes in memory
– Write changed data blocks back to the disk


– Copy data blocks that will hold the new rows from disk to memory
– Assemble the rows in memory
– Write data blocks back to the disk


– Copy data blocks that contain the rows to be deleted from the disk to memory
– Delete the rows in memory
– Write data blocks back to the disk

Dynamically resizing data blocks to accommodate stored rows may result in additional I/O overhead.

For Joins:

Joins may necessitate relocating and sorting data, resulting in significant I/Os. A comprehensive discussion of join costs will be presented in a subsequent chapter.

Minimizing data block transfers is a key optimization objective that can be attained by reducing either the frequency of transfers or the total number of data block transfers (I/Os).

Teradata operates like a singular computer by transferring data between disk and memory.

Teradata’s Shared Nothing Architecture stands apart from a personal computer (PC) because of its parallel unit configuration.

The Shared Nothing Architecture

– Slices a big task vertically into smaller tasks and
– Those tasks are performed independently and simultaneously in parallel

We can imagine each parallel unit as being an independent, single computer.

Teradata’s parallelism relies on three essential components: AMPs (access module processors), PEs (parsing engines), and BYNET, a network that enables the communication between parsing engines and AMPs, as well as AMPs with each other.

These components collectively manage SQL requests.

The Parsing Engine serves as the gateway to the database. Once users successfully access the database, they initiate a Teradata session through which they can submit SQL queries.

Parsing decomposes SQL requests into basic AMP instructions. The Execution Plan, as previously mentioned, provides a high-level overview.

The Parsing Engine delegates processing tasks to all AMPs, transmitting completed work results via the BYNET network to the Parsing Engine. From there, the Parsing Engine conveys these rows to the client tool of choice, such as SQL Assistant or BTEQ.

AMPs handle data blocks and perform database operations, serving as the system’s workhorses. Teradata implements parallelism by distributing table rows among all AMPs.

– AMPs are responsible for storing and retrieving rows
– AMPs sort rows and do aggregations
– AMPs handle all the join processing

These tasks are closely linked to the Execution Plan steps.

The AMPs function concurrently to fulfill the request swiftly. Ideally, each AMP should possess a portion of all the rows within a table. This division of data results in an automatic distribution of labor.

SQL tuning aims to evenly distribute our query’s workload among all parallel units, ensuring that all AMPs return their rows to the Parsing Engine simultaneously. A bottleneck will occur, and the entire system’s performance will be impacted if a single AMP operates slower.

Each AMP possesses its own assigned memory and virtual disk, allowing it to operate autonomously from other AMPs. This feature is a fundamental aspect of a Shared Nothing Architecture.

A virtual disk comprises disk cylinders designated to each AMP, which need not necessarily be situated on a single physical device. Each AMP’s memory and virtual disk are inaccessible to other AMPs.


Maximizing parallelism is a key optimization objective that can be attained by evenly distributing rows across all AMPs during the execution of an SQL statement.

The Primary Index Choice
First, the PDM, then Compression – Measures to reduce Resource Usage

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like