Understanding Teradata Nodes In Parallel Systems: Architecture And Scalability

Table of Contents

What is a Teradata Node?

Teradata nodes are Linux systems packed into a single cabinet, each containing multiple physical multicore CPUs and ample memory. These systems run the parallel database extension software (PDE) on top of the Linux operating system.

On each node, the primary processes of Teradata systems are performed (see our article about the Teradata high-level architecture):

– The Parsing Engines
– The AMPs
– Two redundant BYNETs for the communication between AMPs and Parsing Engines.

Parallelism within a node is achieved by uniformly distributing the workload among all AMPs.

Teradata architecture offers excellent scalability, allowing numerous nodes to connect into a vast system.

The idea of achieving a performance boost by doubling the number of nodes is often called linear scalability. However, this is largely a myth in practice. The flaw becomes apparent when you consider the requirement for perfect parallelism across your workload.

In practice, workload skew is a well-known problem. Adding more nodes will not improve efficiency if a SQL statement’s workload remains on a single AMP. This is an important consideration when expanding your system.

In addition, fault tolerance capabilities are limited, and such architectures lack resilience. It is not feasible for a Teradata system to scale to thousands of nodes. While hot standby nodes can provide some level of fault tolerance, they come at a high cost.

In parallel system terminology, a single node is referred to as a symmetric multiprocessing (SMP) node. A system comprised of at least two nodes is classified as a massively parallel processing (MPP) system.

The communication network, BYNET, is software-based within a single node but hardware-based between nodes, facilitating communication between AMPs and Parsing Engines across different nodes.

Two BYNETs are always available for reasons of performance and fault tolerance.

Both networks are used concurrently to maximize throughput, provided they operate without errors. In the event of a network failure, the backup network ensures uninterrupted operation. Teradata becomes inoperative only if both networks fail.

Years ago, Teradata’s BYNET provided a notable advantage by sorting and merging data, thus reducing CPU workload. However, this benefit may no longer be as significant with the advent of multicore processors. The shift from BYNET to InfiniBand as the primary data transmission backbone is a contributing factor.

Related Services

🏗️ Planning a Data Platform Migration?

Architecture-first approach: we design before a single line of code is written. Zero data loss across every migration delivered.

Our Migration Services →

2 thoughts on “Understanding Teradata Nodes in Parallel Systems: Architecture and Scalability”

Hi Falcon. Thanks. I added the link.

Regarding your question: Yes, each AMP has its own working memory. The so-called FSG-cache of each AMP is used to hold the data blocks read from disk.

Roland – Thanks for an informative article. I have 2 comments/questions:

1. Can you also include the link in the article to ‘Teradata high-level architecture’ that you have referred to in your second paragraph?

2. How is TD’s RAM organized? Does each AMP have its own memory in addition to dedicated disk space? If not, can TD still be considered a shared-nothing architecture RDBMS?

What is a Teradata Node?

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

2 thoughts on “Understanding Teradata Nodes in Parallel Systems: Architecture and Scalability”

Leave a Comment Cancel reply