Teradata Vantage has also made its move into the public cloud. Teradata Vantage is now available in AWS, Microsoft Azure, and Google Cloud Storage. It is worth noting that the same Teradata software runs on these three major cloud providers. This makes it relatively easy to run Teradata on-premise and in these 3 different public clouds simultaneously.
Traditionally Teradata is an on-premise system with its file system based on block storage. Block storage is ideally suited for structured data as it occurs in a data warehouse and is IO optimized. In the cloud, object storage has prevailed. Object storage is typically cheaper and can be used for relational databases, but is mainly used for unstructured data. Amazon S3 is probably the best known, but Microsoft and Google also offer object storage with Azure Blob Storage and Google Cloud Storage.
With Teradata Vantage, it is now possible to access both types for storage for reading and writing at the same time. Teradata calls the access to the object store the Teradata Native Object Store or Teradata NOS. Object store data can be stored as CSV, JSON, or Parquet (columnar). We can query across tables that are partly stored in Teradata Vantage on block storage and partly e.g. on Amazon S3. So the whole thing is transparent, and once we have created a so-called external table in Vantage, we don’t have to worry about where the data is located. But there is, of course, also a piece of bad news: the performance will typically be much worse when object storage comes into play.
How does Teradata NOS work?
Teradata NOS is fully integrated with the Vantage SQL Engine. This also provides the full parallelism we are used to. When we read an external table, rows are distributed across all AMPs using the round-robin method (as it happens, for example, with a fastload).
The integration of data in an object store requires only a few steps.
First, we need to define an external table. This definition references a container (e.g. an S3 bucket in AWS). A variety of objects can reside in this container. For performance reasons, we can apply so-called path filtering. We define the path from which files are read. Path filtering has nothing to do with the filter we know e.g. from SQL where we apply a WHERE condition. The advantage of path filtering is its performance. Only files that are really needed are read, decompressed, decrypted, transformed, etc., by Teradata.
This transparent integration of external data stored in object stores would have required a complex ETL process not so long ago. Teradata NOS can not only be used to query external data but also offload data (e.g. cold data).
However, the great thing about such offloaded data is that it can be accessed anytime via external table definitions. This is, of course, a big advantage over traditional backup and restore approaches.
What is the difference between Teradata NOS and the Query Grid?
Querygrid is based on a completely different design than Teradata NOS. With Teradata Querygrid, there are always two processing or SQL engines involved. When Teradata sends a query to another engine, the SQL is practically transmitted and executed on the external system. The result is then sent from the external system back to Teradata.
Teradata NOS, in contrast, accesses the external data directly and is also fully integrated with the Teradata SQL Engine, unlike the query grid. Even though Teradata NOS currently supports only three object store file formats (Parquet, CSV, JSON), it is a promising development in the right direction. Currently, we still have to use Query Grid to access e.g. HDFS, but I assume that Teradata will support more file formats in the future.
If you want to learn how easy it is to integrate an external table into your SQL queries, please watch this video, where we show you step-by-step how it works: