fbpx

In this article, I will explain the main differences in the Architecture between Teradata and Snowflake.

It is not meant to be a complete comparison of all features or advantages and disadvantages but is limited to the differences from an architecture perspective.

Snowflake vs. Teradata Architecture

snowflake teradata

How is the data distributed in Snowflake?

Snowflake is not a Shared Nothing architecture; instead, the computing resources access shared data.

Therefore there is no concept like the Primary Index. All computing resources can access the data.

What happens if the system size has to be increased?

This is the advantage of cloud architectures such as Snowflake: It simply scales.

On a Teradata system, an upgrade always means a restructuring. When AMPs come in, tables need to be redistributed to ensure optimal parallelism.
A new hashmap is needed, which contains the additional AMPs. However, it must be said that since Teradata 16, the restructuring can be delayed by using several hashmaps.

A significant advantage over a full Teradata system is given if you want to save space, e.g., by MVC. The paradox: In this case, you first need a lot of space to save space. In Snowflake, the data is always compressed, and if there is no more space available, new space can be added at short notice.

What about indexes?

There is no such thing as a secondary index or join index in Snowflake.

What about workload management?

Snowflake relies on the concept of a virtual warehouse that separates the workload. Snowflake says there is no need for workload management, but it makes sense to have both when you look at Teradata.
Teradata offers a genuinely sophisticated Workload Management (TASM) and the ability to partition the system. A virtual partition can access the full resources (CPU) if they are not needed by another partition—a significant advantage.

What is a disadvantage of the Teradata architecture compared to Snowflake?

Scaling is not as flexible as with a cloud database in which compute nodes and data can scale independently.

In Teradata, computing power and data scale together. This ensures that IO (whatever the bottleneck is) is always available in sufficient quantities.

But the disadvantage is that more storage cannot merely be added without distributing the data. Modern database systems run as much as possible in the main memory. On the other hand, Teradata uses spool files in each step of the execution plan, which often has to be stored temporarily on the mass storage.

How are statistics collected?

In contrast to Teradata, Snowflake collects the required statistics on its own, without the user having to do anything.

How does Snowflake access data?

Tables are stored in so-called micro partitions (max. 500MB in size). Within each micro partition, the data columnar is stored (the individual column values are compressed).

Each micro partition has metadata. In the metadata, the offset of each column is stored as well as the range of values. Access is gained by looking up the metadata in which micro partition is needed to query the tables and where the required columns’ data are located.

This is completely different from the hashing used in Teradata. But as you can see, Snowflake is a pure column store. Similar to the row-level partitioning used in Teradata is the sorting of data within columns in Snowflake.

What does SQL Tuning look like in Snowflake?

The possibilities of tuning in Snowflake are limited in contrast to Teradata. However, the numerous options offered by Teradata are not necessary due to the architecture.

There is no workload management comparable to Teradata (TASM) in Snowflake.

Workload tuning is done by defining virtual warehouses. In principle, nothing more than independent clusters of computers accessing the same data.

By forming clusters, performance can be optimized at the highest level.

Snowflake uses intensive disk-level caching. The Disk Cache is a combination of SSD and memory. When creating virtual warehouses, it is advisable to create the users that execute the same queries in the same virtual warehouse. Then the disk cache can be used optimally.

Snowflake also offers a cache of the result set, which is available to every user independent of the virtual warehouse.

This cache holds the result sets of all queries of the last 24 hours. Each time the cache is used, the counter starts from zero again. This is far more extensive than the caching of the query plan in Teradata.

Teradata offers a block-level cache using Teradata Virtual Memory. The cache is available to all users of the data warehouse but purged frequently and temperature-based. But it’s not a cache of the result set. It’s caching of data blocks only.

It may be tempting to shut down a Snowflake virtual warehouse as soon as possible to save costs, but the disk cache is also lost when the warehouse is closed. Here, costs must be weighed up against performance. This is not an issue at Teradata. The system is available 24/7.

Snowflake does not use indexes or partitioning that can be determined by the user. However, it is possible to define a cluster key to locate specific column values within one or a few micro partitions:

ALTER TABLE Customer CLUSTER BY (CustomerId);

How are user sessions handled?

As a counterpart to the parsing engine which handles sessions and delivers result sets to the client, Snowflake has a service layer.

This manages sessions and everything that goes with them (queries, transactions, result sets, etc.) and also maintains a cache of the results of executed queries.

What about Referential Integrity?

Snowflake only offers what Teradata understands by Soft Referential Integrity. There is no forced referential integrity as in Teradata.

Is Skewing a problem in Snowflake?

When we speak of skew in Teradata, we mean an uneven load on the AMP.
Skewing is also present in Snowflake at the node level. Snowflake uses a method to minimize skew automatically: file stealing.

When a worker process has finished its work (i.e., scanned its input files), it requests additional files from other worker tasks. The other worker tasks check how many input file sets they have left to process and pass them to the requesting worker task. This automatically balances the skew.

This kind of skew optimization is not possible on a shared-nothing system like Teradata.

Does Snowflake also come with its own hardware?

No. Snowflake runs exclusively in the cloud. In principle, Snowflake has understood how to make the best use of the various cloud services (Amazon, Google, Azure).

Take AWS (Amazon Web Services) as an example:
Data Storage uses Amazon S3.
Computing power in the form of a virtual warehouse is provided via clusters of EC2 instances.

Disk cache is provided by SSD/Main Memory of the EC2 instances.
Meanwhile, there is also Teradata in the cloud. However, it should be remembered that the disadvantages of Shared Nothing Architecture exist here.

What kind of optimizer does Snowflake use?

Like Teradata, Snowflake uses a cost-based optimizer.

However, all required statistics are automatically maintained while loading or updating the data. In Teradata, these must be defined at least once by the user (there is, however, the option of having Teradata analyze the query logs to suggest statistics).

When is the Execution Plan created in Snowflake?

We know IPE (Incremental Planning and Execution) in Teradata. Here, the execution plan is modular, and the final plan is determined during the query’s execution.

Snowflake also makes certain decisions only at query runtime. This includes, for example, the selection of the join method.
Both optimizers can, therefore, generate dynamic execution plans.

Will Snowflake squeeze Teradata out of the market?

I think not. I hope not.

Teradata is simply a perfectly tuned system, and I don’t think it will quickly disappear from the market.

However, I do see Hadoop’s chances dwindling.

For example, it is trendy to use Hadoop to store JSON and load after conversion into a database system. Snowflake can handle this task with flying colors—no need for Hadoop.

I must admit that I am impressed with Snowflake. I have always been skeptical of the latest hypes like Cloud, Data lake, Hadoop, etc. to name a few. It’s difficult to separate the marketing scream from the actual values. But here, the developers of Snowflake seem to have made a big hit.

Will Teradata one day have the same advantages as Snowflake (i.e. elasticity through cloud/shared data architecture)?

The architecture of Teradata limits this. Even if Teradata is offered in the cloud, this does not mean that it is a full “cloud database”.

Here one must look very carefully at what the individual providers understand by cloud. There is a lot of marketing involved.

Snowflake vs. Teradata Summary

As you have seen, at Snowflake, we are dealing with a completely different design.

Do you have experience with Teradata and Snowflake? Leave me a comment and tell me about your experience!

DWHPRO Teradata Quiz

There are countless instructions on the internet on migrating from Teradata to Snowflake with seemingly no complications. To me, it almost seems like an affiliate program driven by Snowflake.

I would be surprised if migration were as easy as it is presented.

Please write to me about your experiences with Snowflake and whether the high expectations of Snowflake Marketing keep their promises. Here is a brief overview of what to expect when migrating from Teradata to Snowflake:

  • Table DDL statements can be quite easily migrated
  • View definitions can be quite easily migrated
  • Teradata Stored Procedures have to be rewritten in Javascript or Phyton
  • Macros will have to be rewritten in Javascript or Phyton and become Stored Procedures in Snowflake
  • Teradata Load Utilities (fastload, multiload, tpump, tpt) are replaced with the COPY INTO command.

Try Out Snowflake For Free

You’ll receive $400 of Snowflake credit to use over 30-days:

https://trial.snowflake.com/

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
Buy the Book Teradata Query Performance Tuning

DWH Pro Admin

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Avatar
    Samrat Pandiri says:

    Hi Ronald.. Can you shed some light on this paragraph – “Teradata offers to the cache using Teradata Virtual Memory. The cache is available to all users of the data warehouse. This cache holds the result sets of all queries of the last 24 hours. This is far more extensive than the caching of queries in Teradata.”

    • Hi Samrat. Somehow I mixed up the sentences by mistake. The 24-hour cache is available in Snowflake and the counter is starting from zero again if the result set of a query is taken from the cache. Teradata has intelligent memory which is data temperature driven. I hope it is clearer now (and correct).

      BR

      Roland

      • Avatar
        Samrat Pandiri says:

        I think this need to be tweaked in a bit more to clear the confusion.

        As I understand Snowflake does cache the results. But Teradata doesn’t cache the results. One thing it stores in the cache is the Plastic Steps.
        Coming to the Teradata Intelligent Memory, it just moves your data blocks from cold to hot storage based on the access. So, this methods will cache the data blocks in Memory rather than accessing it from the disk when they are used frequently.

        • You are completely right. Somehow I was mixing the different types of caching. I changed it again. Thanks for your help. I really appreciate it!

          • Hi… just stumbled on this very interesting thread. To delve a bit deeper into this, you are correct that Teradata does not cache results but it does offer two methods of data caching. The first is automatic: FSGCache is used to store Most Recently Used data blocks and is available to all queries globally. The size of FSGCache is tunable. The second is TIM (Teradata Intelligent Memory) which acts as a storage tier for very hot data. This operates as a Most Frequently Used cache. TIM is also configurable but optional. As the price of memory has dropped over the years, Teradata applications have been able to leverage larger memory and achieve much higher IO efficiencies with these two methods.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >