Introduction to Teradata in the Cloud
Cloud databases are pressuring traditional data warehousing MPP systems.
This blog will illustrate the reasons for this and outline the pros and cons of each database system.
The Architecture Of A MPP Database System
MPP database systems employ a shared-nothing architecture, wherein each node possesses its own CPU, main memory, and mass storage device.
Other examples of MPP database systems include Netezza, Amazon Redshift, and Microsoft Azure Synapse Analytics.
An MPP system evenly distributes data across all nodes.
All MPP database systems share the same fundamental architecture. However, how data is localized and stored on nodes by rows or columns varies among them.
Each manufacturer has devised its own tactics.
Teradata has the ability to store data in both rows and columns. Additionally, one can retrieve data from a Column Partitioned Table by utilizing a Primary Index.
Netezza uses its hardware, FPGA, and zone maps, to define where the searched data is not located and limits the queries to the required columns.
Almost all MPP database systems offer the following three options for distributing the data (Netezza, Amazon Redshift, Microsoft Azure Synapse):
- Distribute All
Tables are copied entirely to all nodes. This is ideal for small tables as they are already available for joining on all nodes without the need to copy data (Teradata does not offer this kind of distribution, but it copies if necessary whole tables when executing a query for join preparation)
- Distribute By Hash
Here the distribution occurs via a key (in Teradata, it is the primary index).
- Distribute Randomly
The data of a table is distributed evenly but randomly across all nodes. In Teradata, this is achieved by using so-called NOPI tables.
Advantages Of A MPP Database System
We can achieve excellent performance by distributing the load across nodes.
- Scalability and Concurrency
In principle, MPP systems can be scaled linearly by adding new nodes (CPU, memory, and mass storage). Doubling the number of nodes doubles the performance.
Disadvantages Of A MPP Database System
Most MPP database systems come with hardware that has been specially optimized to achieve the best performance.
These include the BYNET in Teradata, which performs particular tasks (sorting and merging of answer sets), or the special hardware from Netezza to restrict the read data.
This often makes the system complicated and expensive.
- Distribution of Data
MPP database systems’ significant advantage is their biggest disadvantage: the distribution of data evenly across nodes. The even distribution of the data is essential, but choosing the right distribution key is up to the user.
Modern cloud databases like Snowflake do not have this problem because they are shared data systems, and all nodes can access the common database.
If the system is scaled up or down, this is connected with downtime in which the data must be distributed evenly (to the old and the new nodes).
- Lack of elasticity
MPP database systems are not as ideal as cloud databases due to their lack of elasticity.
MPP database systems can scale, but this takes weeks as hardware has to be added or data restructuring is needed. Snowflake, for example, can scale in real-time without any downtime. Snowflake is an actual cloud database.
Many manufacturers now offer their databases in the cloud, but essential features are missing. I don’t consider them cloud databases, but it’s a matter of definition.
Teradata is also available in the cloud. But what remains of Teradata if there is no BYNET anymore? What would Netezza be without dedicated hardware? I think running your database on somebody else’s computer is insufficient.