Teradata Vs. Redshift: A Comparison Of Join Strategies And Architecture

Teradata and Redshift share similar architectures and data distribution methods. Teradata’s AMPs store portions of table data, while Redshift utilizes slices.

There are notable differences in the way data is stored on file systems.

Teradata can function as a Column Store, which can be determined on a per-table basis. However, the primary advantage lies in the superior performance achieved by storing tables in the standard row format without switching to columnar tables and their associated drawbacks in Teradata (which will be discussed in a future blog post).

We will not delve into Teradata Columnar Storage in this comparison of join strategies, as Teradata converts the columns of a columnar table into rows before processing the data (early materialization). As a result, there is no difference in the processing of the joins, even if only the necessary columns are chosen for columnar tables.

Table of Contents

Merge Join Preparation: Co-Locating Rows

Teradata and Redshift utilize hashing to distribute data evenly among parallel units, such as Teradata AMPs and Redshift Slices.

Teradata dictates that rows must be on the same AMP to be joined.

Redshift also requires data to be on the same slice, resulting in little disparity.

Preparation in Redshift resembles that of Teradata. Broadcasting in Redshift is the equivalent of copying the entire table to all AMPs in Teradata.

Teradata and Redshift both use hashing to redistribute data based on join columns. However, in Redshift, a new DISTKEY is established before distribution.

In Teradata, it is possible that table redistribution may not be required if both tables share the same DISTKEY. Alternatively, both tables may require redistribution.

When copying or distributing tables, Redshift operates similarly to Teradata.

Redshift and Teradata vary in their approach to storing table copies on slices. While Redshift has the capability to do so permanently, it is only suitable for smaller tables. Thus, I do not believe this feature provides a significant performance advantage over Teradata.

Comparison Of Join Methods

The Teradata Merge Join

Both tables must have identical primary indices to fulfill the co-location requirement.

For non-row-partitioned tables, rows are sorted by hash, enabling a binary search to occur from a hash value of the left table in the right table and vice versa. The joining of rows differs slightly depending on whether slow-path or fast-path merge joins are performed, but the method remains the same.

To compare partitions directly in Row-Partitioned Tables, it is necessary to join the partition columns, as the rows within each partition are sorted by hash value.

The Redshift Merge Join

For Merge Join in Redshift, the tables must have matching DISTKEYs, making the two systems interchangeable. Therefore, either one or both tables may need to be redistributed to a new primary index (Teradata) or DISTKEY (Redshift) in both systems.

Redshift requires both tables to have identical SORTKEYs. However, data sorting in Redshift is less sophisticated than in Teradata.

Teradata automatically sorts rows in ascending order by Rowhash, whereas in Redshift, the DBA must ensure data is deleted and re-sorted periodically.

Redshift doesn’t employ a merge join if over 20% of the data is unsorted since sorting large amounts is costly.

Both systems utilize statistics for creating join plans. However, Redshift requires more maintenance due to the need to consider additional factors compared to Teradata.

Summary

Teradata is a more mature and easier-to-maintain option for the Merge Join method.

Related Services

🏗️ Planning a Data Platform Migration?

Architecture-first approach: we design before a single line of code is written. Zero data loss across every migration delivered.

Our Migration Services →

Teradata vs. Redshift: A Comparison of Join Strategies and Architecture

Merge Join Preparation: Co-Locating Rows

Comparison Of Join Methods

The Teradata Merge Join

The Redshift Merge Join

Summary

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

Leave a Comment Cancel reply