Teradata Architecture – AMP, BYNET and Parsing Engine
Teradata Query Parallelism
If we run a query on a Teradata system, it runs in parallel in every step, no matter if it is a join, sorting, aggregation step, etc.
The great advantage is that this applies to each step of the query process. That’s what makes Teradata unique. The big advantage Teradata has over other database systems is that its high level of parallelism was part of the Teradata architecture from the start many years ago (when most of its components were implemented in hardware instead of software)
In the course of time, even new features have been made possible in order to further increase parallelism. However, much would not have been possible without the architecture underlying Teradata.
Parallel Execution across the AMPs
AMPs are independent processes that perform a wide range of activities independently of each other. Each SQL query is first broken down into subtasks assigned to the AMPs. Each AMP completes its tasks and delivers a partial result. When all AMPs are ready, the overall result is returned.
Parallelism at AMP level is one of the main reasons for a Teradata System offering tremendous performance when used the right way.
AMPs are not specialized but capable of performing any kind of task.
Today there are hundreds of AMPs on each Teradata System available.
The Tasks of a Teradata AMP
- Reading of Rows
- Writing of Rows
- Row Locking
- Index creation and maintenance
- Maintaining the transaction log
- Backup and Recovery
- Bulk and Transactional Loading
The Components of a Teradata AMP
Each AMP has its own exclusive resources:
- Logical storage unit
Since Teradata is a shared-nothing architecture, all resources of an AMP are only assigned to this AMP. An extension of the system (by adding hardware) therefore goes hand in hand with the possibility of linear growth in performance.
The Teradata Primary Index – Hash Partitioning
Parallelism is achieved by hash partitioning. The existing data is evenly distributed among the existing AMPS, with the goal that each AMP must perform approximately the same.
Hash partitioning works great to distribute large amounts of data to the AMPs. The drawback is that individual AMPs can represent a bottleneck if the task distribution is not even – skewing is the result and this pressure will often occur.
Skewing is one of the main problems a performance tuner on a Teradata system has to solve.
Hash partitioning is easily achieved by defining one or more columns for each table from which a hash value is calculated. This hash value determines the AMP for each row. The columns used for hash partitioning are the so-called Primary Index:
CREATE TABLE Customer ( Customer_ID BIGINT NOT NULL, Lastname VARCHAR(500), Firstname VARCHAR(500) ) UNIQUE PRIMARY INDEX (Customer_Id);
Teradata is Pipelining the Steps
Pipelining increases the parallelism of queries even more.
As already mentioned, a request is split into several steps. Each step represents a subtask. This can be, for example:
- Read all rows of a table
- Update a subset of table rows
- Read two tables, redistribute them and join them
For example, pipelining means that while rows of the tables are still being redistributed, the join can start on the rows that have already been redistributed.
As you can see, steps can have different complexity, and AMPs may also need to interact with each other.
The Teradata Architecture
Database systems all work in in a similar way: Data rows travel from the storage to the main memory.
Data rows can only be processed by the CPU when they are in main memory. Data on disk by itself is useless!
Storage, memory, and CPU build one unit.
Teradata is a shared nothing architecture, consisting of many of these units.
CPU, main memory, and storage are the physical components. Teradata executes many processes on each physical CPU, to increase the degree of parallelity. These processes are the so-called AMPs.
AMPs are the workers doing the most important job: They are moving data between the disk and the main memory.
A typical Teradata system executes hundreds of AMPs at the same time, all of them doing their share if work in parallel with all other AMPs.
Each AMP owns two memory types (Teradata Intelligent Memory and the FSGCache). This two kind of memories is used by the AMP to intelligently cache data from the persistent storage owned by that AMP.
Both types of memory are exclusive to each AMP; no other AMP can use the assigned memory. The AMP reads data rows from the disk into the memory, eventually, changes them, and writes them back.
The Teradata Intelligent Memory contains the most frequently used data, while the FSG cache contains the data used by active requests.
But not only memory is assigned exclusively to an AMP. Each AMP owns a particular logical storage unit and is the only one who can read it and write to it.
There is one exception to the rule: In the case of an AMP failure, a backup AMP can take over one AMPs work, until it is functional again.
The virtual storage system manages the storage of each AMP. While in earlier times only hard disks were available for storage, current Teradata systems additionally offer solid stage drives. The virtual storage system is the interface between the AMPs logical storage unit and the physical storage device.
As modern Teradata systems often use a mix of solid state disks and hard drives, the Teradata virtual storage also monitors the usage of data and moves frequently used data to faster storage devices, and less frequently used data on the slower devices.
A network, the BYNET, connects all AMPs. It allows the Parsing Engine to communicate with the AMPs (giving them instruction) and it allows the AMPs to communicate with each other.
The Parsing Engine is responsible for checking the syntax of our request, to ensure that required permissions are available, and to create the so-called execution plan (we will talk about execution plans in the next lesson).
Furthermore, the optimizer is part of the parsing engine software. We will talk a lot about the optimizer in the upcoming lectures, as it makes a lot of use of statistics.
AMPs are doing all the updates, deletes, and inserts on a table (and as a side note, AMPS even have some particular tasks, such as character set conversions).
Teradata people like talking about AMPs, but usually, they mean the units of AMP, memory, and disk.
Teradata distributes the data rows of a table across all AMPs
(precisely we would need to say “across the disks of all AMPs”).
Distribution is done to share the workload and is the characteristic of any parallel database system. I repeat, as it’s an important concept: The hashing algorithm spreads the rows of each table across all AMPs, or in other words: Each AMP hold rows of each table.
The Primary Index
The primary index is a mechanism defining the location of each data row; It may consist of a single or multiple columns.
The values of the primary index columns may be unique or non-unique.
Don’t confuse the primary index with the primary key of a table. The primary key is a concept of logical data modeling; the primary index defines the AMP (and therefore logical storage device) for data rows.
While the primary key is utilized to identify each object stored in a table uniquely, the main idea behind primary index choice is to use the parallel system in the most efficient way.
Each data row has to pass the hashing algorithm. The hashing algorithm calculates the so-called row hash from the primary index columns.
The input order of the primary index columns doesn’t matter:
Data rows are distributed based on the calculated row hash.
The so-called hash map assigns each row hash to one particular AMP.
If data rows are stored (insert, changed or updated) or retrieved, it’s the AMP which is doing this work.
Same input values always map to the same row hash value. (as long as the system configuration is unchanged).
Sometimes different combinations of primary index columns map to the same row hash value, which is called “hash collision.”
Hash collisions can negatively impact performance.
The idea of hashing is to achieve an even distribution of data rows across all AMPs.
In the best case, each AMP has the same amount of data rows, as this ensures that the parallel system works in the most efficient way.
But there is a disadvantage with this approach: To join the rows of two tables they must be on the same AMP.
In other words: “The primary index of both tables has to match the join columns.”
If this is not the case, the data rows of one or both tables have to be redistributed by the join columns, or Teradata duplicates all date rows to all AMPs. Whatever is the strategy to co-locate the rows: It’s a quite expensive step.
Another limitation of hashing is that direct access to a data row only is possible if the predicate of a WHERE condition matches the primary index definition. Otherwise, in the absence of other indexes, the full table has to be scanned to find the requested data row.
We have to consider three criteria when selecting the primary index:
– A Good Access Path
– Even Distribution of Rows across all AMPs
– Low Volatility of the Primary Index Columns
A good access path means to achieve optimal retrieve and join performance.
In retrieve steps, the primary index is the most efficient way to pick up data rows.
Join steps are fast if the join columns of both tables are the same, making them an excellent primary index candidate.
The second important criteria for primary index choice is even row distribution, to use the parallel architecture in an optimal way.
Often we can’t optimize for both mentioned goals at the same time. We may have to design the primary index for a fast access path, accepting that the data distribution is not ideal.
It’s perfectly fine, as long as we are aware of this fact and know how to deal with resulting issues.
Finally, the volatility of the primary index values should be small. When the primary index value of row changes, all rows are sent to the hashing function and re-distributed to their new AMP.
Ideally, we would like to have a non-volatile primary index with an even row distribution and a fast access path (indexed access).
I hope with this short introduction to the Teradata Architecture we laid a solid foundation to understand the meaning of statistics in the upcoming lectures.