fbpx

Teradata – Off To The Cloud?

By DWH Pro Admin

January 25, 2020


Cloud databases put traditional MPP systems in the area of data warehousing under increasing pressure.

Why this is so and the advantages and disadvantages of both types of database systems I show you in this blog post.

The Architecture Of A MPP Database System

MPP database systems are shared-nothing systems. Each node has its CPU, main memory, and mass storage device in an MPP database system.

A typical MPP database system besides Teradata is, e.g., Netezza. But also Amazon Redshift or Microsoft Azure Synapse Analytics fall into this category.

The main characteristic of an MPP system is that data is distributed evenly across all nodes.

The basic architecture is, therefore, the same for all MPP database systems. The main difference is how the data is stored on the nodes (rows or columns) and localized.

Each manufacturer has developed its own strategy.

Teradata can store data in rows as well as in columns. Meanwhile, it is also possible to find data from a Column Partitioned Table using a Primary Index.

Netezza uses its hardware called FPGA and zone maps to define where the searched data is not located and limits the queries to the required columns.

Almost all MPP database systems offer the following three options for distributing the data (Netezza, Amazon Redshift, Microsoft Azure Synapse):

  • Distribute All

    Tables are copied entirely to all nodes. This is ideal for small tables as they are already available for joining on all nodes without the need to copy data (Teradata does not offer this kind of distribution, but it copies, if necessary whole tables when executing a query for join preparation)
  • Distribute By Hash

    Here the distribution takes place via a key (in Teradata, it is the primary index).
  • Distribute Randomly

    The data of a table is distributed evenly but randomly across all nodes. In Teradata, this is achieved by using so-called NOPI tables.
Teradata - Off To The Cloud? 1
Distribution Options of an MPP Database System

Advantages Of A MPP Database System

  • Performance
    Excellent performance can be achieved by distributing the load across nodes.
  • Scalability and Concurrency

    In principle, MPP systems can be scaled linearly by adding new nodes (CPU, memory, and mass storage). Doubling the number of nodes doubles the performance.

Disadvantages Of A MPP Database System

  • Complexity
    Most MPP database systems come with their hardware that has been specially optimized to achieve the best performance.

    These include the BYNET in Teradata, which performs particular tasks (sorting and merging of answer sets) or the special hardware from Netezza to restrict the data that is read.

    This often makes the system complicated and expensive.
  • Distribution of Data

    The significant advantage of MPP database systems is also their biggest disadvantage: distribution of data evenly across nodes. The even distribution of the data is essential, but choosing the right distribution key is up to the user.

    Modern cloud databases like Snowflake do not have this problem because they are shared data systems, and all nodes can access the common database.
  • Downtime

    If the system is scaled up or down, this is connected with downtime in which the data must be distributed evenly (to the old and the new nodes).
  • Lack of elasticity

    MPP database systems are not as ideal as cloud databases due to their lack of elasticity.

    MPP database systems can scale, but this takes up to weeks as hardware has to be added or data restructuring is needed. Snowflake, for example, can scale in real-time without any downtime. Snowflake is a real cloud database.

    Many manufacturers now offer their database in the cloud, but essential features are missing. I don’t consider them cloud databases, but it’s a matter of definition.

    Teradata is also available in the cloud. But what remains of Teradata if there is no BYNET anymore? What would Netezza be without dedicated hardware? Running your database on the computer of somebody else is not sufficient in my point of view.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
Buy the Book Teradata Query Performance Tuning

DWH Pro Admin

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>