What Everyone Should Now About The Teradata Design
Teradata Query Parallelism
If we run a query on a Teradata system, it runs in parallel in every step, no matter if it is a join, sorting, aggregation step, etc.
The great advantage is that this applies to each step of the query process. That's what makes Teradata unique. The big advantage Teradata has over other database systems is that its high level of parallelism was part of the Teradata architecture from the start many years ago (when most of its components were implemented in hardware instead of software)
In the course of time, even new features have been made possible in order to further increase parallelism. However, much would not have been possible without the architecture underlying Teradata.
Parallel Execution across the AMPs
AMPs are independent processes that perform a wide range of activities independently of each other. Each SQL query is first broken down into subtasks assigned to the AMPs. Each AMP completes its tasks and delivers a partial result. When all AMPs are ready, the overall result is returned.
Parallelism at AMP level is one of the main reasons for a Teradata System offering tremendous performance when used the right way.
AMPs are not specialized but capable of performing any kind of task.
Today there are hundreds of AMPs on each Teradata System available.
The Tasks of a Teradata AMP
- Reading of Rows
- Writing of Rows
- Row Locking
- Index creation and maintenance
- Maintaining the transaction log
- Backup and Recovery
- Bulk and Transactional Loading
The Components of a Teradata AMP
Each AMP has its own exclusive resources:
- Logical storage unit
Since Teradata is a shared-nothing architecture, all resources of an AMP are only assigned to this AMP. An extension of the system (by adding hardware) therefore goes hand in hand with the possibility of linear growth in performance.
The Teradata Primary Index – Hash Partitioning
Parallelism is achieved by hash partitioning. The existing data is evenly distributed among the existing AMPS, with the goal that each AMP must perform approximately the same.
Hash partitioning works great to distribute large amounts of data to the AMPs. The drawback is that individual AMPs can represent a bottleneck if the task distribution is not even – skewing is the result and this pressure will often occur.
Skewing is one of the main problems a performance tuner on a Teradata system has to solve.
Hash partitioning is easily achieved by defining one or more columns for each table from which a hash value is calculated. This hash value determines the AMP for each row. The columns used for hash partitioning are the so-called Primary Index:
CREATE TABLE Customer(Customer_ID BIGINT NOT NULL,Lastname VARCHAR(500),Firstname VARCHAR(500)) UNIQUE PRIMARY INDEX (Customer_Id);
Teradata Pipelining Steps
Pipelining increases the parallelism of queries even more. Pipelining makes it possible to start a step even if the predecessor step is not yet finished.
As already mentioned, a request is split into several steps. Each step represents a subtask. This can be, for example:
- Read all rows of a table (simple step)
- Update a subset of table rows (simple step)
- Read two tables, redistribute them and join them (complex step)
Steps can have different complexity, and AMPs may also need to interact with each other.
In a join step (which is complex), pipelining allows starting already the join activity while rows of the participating tables are still being redistributed.
Teradata Parallel Multi-Steps
While pipelining steps are nested into each other, i.e. the input of the predecessor step is made available to the successor as early as possible, Teradata also has another level of parallelism available.
Steps that are independent of each other can be executed in parallel.
The Teradata BYNET
Teradata is a shared-nothing architecture and as we know the work is done in parallel by the AMPs.
But since there is a need for AMPs to communicate with each other, a network is needed to exchange both messages and data: The BYNET.
The Tasks of the BYNET
However, BYNET is more than just a normal network. It has functionality that has been specially developed for Teradata:
- Message Delivery: Guarantees that messages arrive at the target AMP
- Coordinate multiple AMPs working on the same step
- Sorting of the final result set when sending to the client
- Minimizes the number of AMPs needed for a step
- Congestion control to avoid an overloaded network
Message Passing & the BYNET
To better understand the tasks of BYNET we have to introduce a new kind of virtual process: The Parsing Engine. It is responsible for creating the execution plan of a request. The BYNET is the interface between the Parsing Engine and the AMPs.
Messages can be sent from the Parsing Engine via BYNET to the AMPs, but BYNET is also responsible for the AMP to AMP communication.
BYNET can send messages to all AMPs, a group of AMPS, or even a single AMP.
Sorting the Final Answer Set
Which is unique in Teradata:
The sorting of the ResultSet is done in parallel, at each level (AMP, Node, BYNET, Parsing Engine) data is pre-sorted to avoid an expensive Sort Step at the end:
- Each AMP locally sorts its data (this is done in parallel)
- Each Node takes one buffer of data from all its AMPs and sorts it (buffer by buffer by AMP)
- The BYNET passes one buffer per Node to the Parsing Engine which does the final sort.
These were the essential components of the Teradata Shared Nothing architecture. More details about how data is stored can be found here: