fbpx

Teradata Architecture Is So Famous, But Why?

By Roland Wenzlofsky

December 3, 2019


Before we go into Teradata architecture in more detail, we need to talk about how a computer is built. After all, this is also the basis of a Teradata system.

Teradata Architecture – Why Does Everyone Copy It?

As one of the pioneers in data warehousing, Teradata was and is a role model for many subsequent database systems in terms of architecture.

Even if Teradata has come into the years, the system developers have already considered many details from the beginning that still make Teradata capable of competing today.

If we look at various modern database systems today, such as Redshift from Amazon (or Netezza), for example, we can recognize many things that were used by Teradata for the first time.

Teradata was designed from the beginning for parallelism in every smallest detail and can still be found today among the top RDBMS for Data Warehousing.

single computer
Single Computer

Data is permanently stored on mass storage devices and loaded into the CPU’s main memory for processing.

It is important to understand that accessing the mass storage device is much slower than accessing the main memory. Further, accessing the main memory is much slower than accessing data already in one of the CPU caches.

Before the CPU can process data, it must be loaded into the main memory.

The Teradata architecture can be easily imagined as a number of individual computers that can communicate with each other:

teradata architecture
Teradata System

Teradata Data Distribution

To split the workload, Teradata uses a hashing algorithm that distributes the rows of each table evenly among the so-called AMPs (we will talk later in this article about exactly what an AMP is and what its tasks are. For now, it’s sufficient to know that AMPs are doing the main work).

teradata hashing
Data Distribution by Hashing

The Parsing Engine

An important part of the Teradata architecture is the Parsing Engine (PE).

The Parsing engine receives a request (e.g., an SQL statement) and generates an execution plan for all AMPS that are required to complete the request. Ideally, the plan is structured so that all AMPs start and finish their tasks simultaneously. This ensures optimal parallel utilization of the system.

parsing engine
The Parsing Engine controls the AMPs

As you can see in the figure above, between the AMPs and the parsing engine is the BYNET, representing the communication network over which both the data and instructions are exchanged. We talk about the BYNET in detail later in this article.

The Parsing Engine has the following main tasks:

  • Logging on and Logging Off Sessions
  • The parsing of requests (syntax check, checking authorizations)
  • Preparation and optimization of the execution plan
  • The Parsing Engines uses statistics to build an optimized plan.
  • Controlling the AMPs by Instructions
  • Communication with the client software
  • EBCDIC to ASCII conversion in both directions
  • Transfers of the result of a request to the client tool

Each Teradata System can use multiple parsing engines.

The number of parsing engines can be increased by the system as needed because each parsing engine can only process a limited number of sessions.

Currently, there are 120 sessions that any parsing engine can manage. These can be sessions of different users, but also 120 sessions of the same user.

The Teradata AMP

AMPs are the real workers in a Teradata System who execute the instructions they receive from the Parsing Engine (the Execution Plan).

AMPs are independent units that have their own main memory and mass storage allocated to them.

The allocation is exclusive, i.e., no AMP has access to the resources of another AMP.

These are the main tasks of an AMP:

  • Storing and retrieving of rows
  • Sorting of rows (for details, read How Teradata sorts the result set)
  • Aggregation of rows
  • Joining of tables (see also: The Essential Teradata Join Methods)
  • Locking of tables and rows
  • Output conversion ASCII to EBCDIC (if the client is a mainframe)
  • Management of its assigned space
  • Sending of rows to the Parsing Engine or other AMPs (via the BYNET)
  • Accounting
  • Recovery handling
  • Filesystem management

Each AMP can perform multiple tasks simultaneously. By default, 80 tasks can be executed in parallel.

The Teradata Node

Parsing engines and AMPs are processed and run on a node. A node is usually a Linux machine equipped with multiple physical CPUs.

Each node can run hundreds of AMPs. Each AMP has its own portion of the main memory and its own portion of mass memory (called virtual disk).

Teradata Node
The Teradata Node

Nodes are connected to a disk array, and each AMP is assigned a part of it as a logical disk. Nowadays, SSDs are used, and the Teradata Intelligent Memory system does management. But the principle is the same.

Disk array
Node with Disk Array managed by Teradata Intelligent Memory

Massive Parallel Processing

A Teradata system can consist of a large number of nodes. These, in turn, are connected via BYNET.

However, this is a physical network, while the BYNET within a node connects the AMPs with the parsing Engine and with each other, is implemented in software:

Teradata Architecture Is So Famous, But Why? 1
Two Nodes combined with Hardware BYNET. Within each Node BYNET is Software
See how Hashing is done on Teradata
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • It is important to understand that accessing the main memory is many times slower than accessing the main memory. typo here

  • Great article, just thought you’d like to know there’s a couple of errors: “It is important to understand that accessing the main memory is many times slower than accessing the main memory.”, think that first ‘main memory’ should be ‘mass storage’.

    Also the phrase “Each node can run hundreds of AMPs. Each AMP has its own portion of the main memory and its own portion of mass memory (called virtual disk).” is repeated both before and after the Teradata Node diagram.

    These are minor things, thanks for writing these interesting articles. Keep up the good work.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >