Teradata vs Redshift Merge Join Strategies

DWH Pro Admin

February 13, 2020

minutes reading time

Teradata and Redshift are very similar in architecture and how data is distributed. For Teradata, it is the AMPs that hold part of the data in a table and for Redshift, it is the slices.

There are significant differences regarding how the data is stored on the file system.

Teradata can also be used as a Column Store (and this decision can be made per table), but the strengths are the great performance that comes from storing tables in the traditional row format without having to switch to columnar tables with all their disadvantages in Teradata (we will cover in another blog post).

We won’t go into Teradata Columnar Storage in this comparison of join strategies, significantly since Teradata anyway transforms the columns of a columnar table into rows before processing the data (early materialization), and therefore there is no strategic difference in the processing of the joins (even if only required columns are selected for columnar tables).

Merge Join Preparation: Co-Locating Rows

Both Teradata and Redshift use hashing to distribute data evenly among the parallel units (Teradata AMPs, Redshift Slices).

As we know from Teradata, rows can only be joined if they are on the same AMP.

Similarly, Redshift requires that the data be on the same slice. So there is not much difference.

Join Preparation in Redshift is similar to that in Teradata.
Copying the entire table to all AMPs in Teradata is equivalent to broadcasting in Redshift.

Redistribution in Teradata is the same as in Redshift and happens by hashing the join columns in Teradata, in Redshift, a new DISTKEY is determined and then distributed.

As in Teradata, it is possible that none of the tables needs to be redistributed because both have the same DISTKEY, or both tables need to be redistributed.

Like Teradata, Redshift always tries to copy or distribute the smaller table,

The two systems differ in Redshift’s ability to keep copies of a table on all slices permanently. This may sound like an advantage at first, but I would like to point out that this only makes sense for small tables. I don’t think this will result in an enormous performance advantage over Teradata.

Comparison Of Join Methods

The Teradata Merge Join

The requirements for the data are that both tables have the same primary index.

This automatically means that for non-row-partitioned tables, since rows are then sorted by hash, a binary search can be performed from a hash value of the left table in the right table and vice versa (depending on whether slow path or fast path merge joins are performed the joining of rows is slightly different, but the method is the same).

For Row-Partitioned Tables, there is also the requirement to join the partition columns so that partitions can be compared directly (because within each partition, the rows are sorted by hash value).

The Redshift Merge Join

Merge Join in Redshift requires that the DISTKEY of both tables be the same. So there is no difference between the two systems. Accordingly, in both systems, one or both tables may have to be redistributed to a new primary index (Teradata) or DISTKEY (Redshift).

In Redshift, both tables must also have the same SORTKEY. The problem in Redshift is that data sorting is not as elegant as in Teradata.

In Teradata, the rows are always automatically sorted in ascending order by Rowhash. In Redshift, the DBA has to ensure that the deleted data is finally removed and that the data is sorted.

If more than 20% of the data is not sorted, Redshift does not use a merge join as sorting a vast amount of data is expensive.

Both systems rely on statics when it comes to the creation of join plans. But as you can see, Redshift is much more maintenance intensive than Teradata because there are additional things to consider.


Although both Redshift and Teradata offer the Merge Join method, Teradata is more mature and easier to maintain.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like