Teradata Architecture Is So Famous, But Why?

Roland Wenzlofsky

December 3, 2019

minutes reading time

Before we go into Teradata architecture in more detail, we need to discuss how a computer is built. After all, this is also the basis of a Teradata system.

Teradata Architecture – Why Does Everyone Copy It?

As one of the pioneers in data warehousing, Teradata was and is a role model for many subsequent database systems in terms of architecture.

Even if Teradata has come into the years, the system developers have already considered many details from the beginning that still make Teradata capable of competing today.

Looking at various modern database systems today, such as Redshift from Amazon (or Netezza), we can recognize many things that Teradata used for the first time.

Teradata was designed from the beginning for parallelism in every smallest detail and can still be found today among the top RDBMS for Data Warehousing.

single computer
Single Computer

Data is permanently stored on mass storage devices and loaded into the CPU’s main memory for processing.

It is essential to understand that accessing the mass storage device is much slower than accessing the main memory. Further, accessing the main memory is much slower than accessing data already in one of the CPU caches.

Before the CPU can process data, the system must load it into the main memory.

The Teradata architecture can be easily imagined as many individual computers that can communicate with each other:

teradata architecture
Teradata System

Teradata Data Distribution

To split the workload, Teradata uses a hashing algorithm that distributes the rows of each table evenly among the so-called AMPs (we will talk later in this article about exactly what an AMP is and what its tasks are. For now, it’s sufficient to know that AMPs are doing the main work).

teradata hashing
Data Distribution by Hashing

The Parsing Engine

An essential part of the Teradata architecture is the Parsing Engine (PE).

The Parsing engine receives a request (e.g., an SQL statement) and generates an execution plan for all AMPS required to complete the request. Ideally, the plan is structured so that all AMPs start and finish their tasks simultaneously. This ensures optimal parallel utilization of the system.

parsing engine
The Parsing Engine controls the AMPs.

As you can see in the figure above, between the AMPs and the parsing engine is the BYNET, representing the communication network over which both the data and instructions are exchanged. We will talk about the BYNET in detail later in this article.

The Parsing Engine has the following main tasks:

  • Logging on and Logging Off Sessions
  • The parsing of requests (syntax check, checking authorizations)
  • Preparation and optimization of the execution plan
  • The Parsing Engines uses statistics to build an optimized plan.
  • Controlling the AMPs by Instructions
  • Communication with the client software
  • EBCDIC to ASCII conversion in both directions
  • Transfers of the result of a request to the client tool

Each Teradata System can use multiple parsing engines.

The system can increase the number of parsing engines needed because each can only process a limited number of sessions.

Currently, there are 120 sessions that any parsing engine can manage. These can be sessions of different users and 120 sessions of the same user.

The Teradata AMP

AMPs are the real workers in a Teradata System who execute the instructions they receive from the Parsing Engine (the Execution Plan).

AMPs are independent units that have their main memory and mass storage allocated to them.

The allocation is exclusive, i.e., no AMP has access to the resources of another AMP.

These are the main tasks of an AMP:

  • Storing and retrieving rows
  • Sorting of rows (for details, read How Teradata sorts the result set)
  • Aggregation of rows
  • Joining of tables (see also: The Essential Teradata Join Methods)
  • Locking of tables and rows
  • Output conversion ASCII to EBCDIC (if the client is a mainframe)
  • Management of its assigned space
  • Sending of rows to the Parsing Engine or other AMPs (via the BYNET)
  • Accounting
  • Recovery handling
  • Filesystem management

Each AMP can perform multiple tasks simultaneously. By default, Teradata can execute 80 tasks in parallel.

The Teradata Node

Parsing engines and AMPs are processed and run on a node. A node is usually a Linux machine equipped with multiple physical CPUs.

Each node can run hundreds of AMPs. Each AMP has its portion of the main memory and its portion of mass memory (virtual disk).

Teradata Node
The Teradata Node

Nodes are connected to a disk array, and each AMP is assigned a part of it as a logical disk. SSDs are used, and the Teradata Intelligent Memory system manages. But the principle is the same.

Disk array
Node with Disk Array managed by Teradata Intelligent Memory

Massive Parallel Processing

A Teradata system can consist of a large number of nodes. These, in turn, are connected via BYNET.

However, this is a physical network, while the BYNET within a node connects the AMPs with the parsing Engine and with each other, which is implemented in software:

Teradata Architecture Is So Famous, But Why? 1
Two Nodes combined with Hardware BYNET. Within each Node BYNET is Software
See how Hashing is done on Teradata
  • It is important to understand that accessing the main memory is many times slower than accessing the main memory. typo here

  • Great article, just thought you’d like to know there’s a couple of errors: “It is important to understand that accessing the main memory is many times slower than accessing the main memory.”, think that first ‘main memory’ should be ‘mass storage’.

    Also the phrase “Each node can run hundreds of AMPs. Each AMP has its own portion of the main memory and its own portion of mass memory (called virtual disk).” is repeated both before and after the Teradata Node diagram.

    These are minor things, thanks for writing these interesting articles. Keep up the good work.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like