What is a Teradata Node?
Teradata Nodes are Linux systems (several nodes are packed into one Teradata cabinet) with several physical multicore CPUs and plenty of memory. On top of the Linux operating system, the parallel database extension software (PDE) is executed.
On each node, the primary processes of a Teradata Systems are being performed (see our article about the Teradata high-level architecture):
– The Parsing Engines
– The AMPs
– Two redundant BYNETs for the communication between AMPs and Parsing Engines.
As we already know, many parallelisms are built within one node as the workload is distributed evenly across all AMPs.
Scalability is one of Teradata architecture’s main benefits; multiple nodes can connect to an even more extensive system.
In theory, doubling the number of nodes would cause a doubling performance. In real life, this is a fairy tale.
You will quickly spot the problem with this theory of linear scalability if you think this would require perfect parallelism in your workload.
But as all of us know from practice, we always fight with a skewed (wrong distributed) workload. We could have hundreds of nodes; they will not help us perform if our SQL statement’s workload ends on one AMP. Keep this in mind when adding nodes to your system to avoid disappointment.
Further, we are limited regarding the number of nodes from a fault tolerance point of view. Such architectures are not fault-tolerant. A Teradata system scaling up to thousands of nodes is not possible. Concepts like hot standby nodes may relieve the situation slightly but are costly for buying some fault tolerance.
A single node is called an asymmetric multiprocessing node in the terminology of parallel systems. Any system containing at least two nodes is named a massive parallel system (MPP).
While the communication network (BYNET) within one node is realized as software, the network between nodes must be implemented in hardware. Still, the purpose is the same: Allow AMPs and Parsing Engines to communicate with each other, even across different nodes.
For performance and fault tolerance reasons, there are always two BYNETs available.
As long as both networks operate without errors, they are used simultaneously to increase the flow rate. If one of the networks fails, a backup is still available, and the systems continue their operation. Only the failure of both networks would make the Teradata inoperative.
While some years ago, the BYNET was one of the significant advantages of Teradata as it took over the tasks of sorting and merging data, relieving the CPU. Today, with the availability of multicore processors, this benefit probably is not significant anymore. The change may be related to switching from the proprietary BYNET to InfiniBand as the new backbone for data transmission.