How does Teradata handle Skew in Joins? The skewed workload is one of the two most important problems we have to solve in Performance Tuning. This is especially important for join steps, as large amounts of data may be copied between the AMPs. The optimizer has techniques to execute joins separately for skewed primary index

Read More

Teradata can choose from various join methods. The most common ones are Merge Join, Hash Join, Product Join, or for PPI tables, the Merge Join variants Rowkey Based Merge Join, or Sliding Window Merge Join. Less often, however, we see the Teradata nested join, since specific prerequisites must be met before it can be used.

Read More

What is Teradata Join Estimation? This article shows how Teradata Join Estimation works when no statistics have been collected. We show which heuristics are used to estimate the number of rows, and why it is important to always collect statistics on all join columns. Teradata Join Estimation Heuristics The worst case is the following: 2

Read More

Joining tables is one of the most expensive operations as each join requires that the rows being joined are on the same AMP. If the two tables do not have the same primary index, it is necessary to copy one or both tables’ rows. This can cause a considerable number of I/Os in large tables.

Read More

Teradata 16 – UNION ALL Starting with Teradata 16, the optimizer has an additional option to handle sets of rows which are combined with “UNION ALL” in views and derived tables. Up to Version 15.10, the sets of rows which are combined with “UNION ALL” are always moved into one common spool before any other

Read More

When comparing the performance of different Teradata join types, we have to consider the total costs of the join strategy chosen for a particular join type. A join plan can consist of several steps, and each of them is causing a certain amount of costs (I/Os and CPU seconds). The total join costs depend on:

Read More

The following case study shows how we can sometimes improve query performance by slightly changing the query text. The starting point is below the SQL statement. Both tables’ primary index is PK (one of the tables is a volatile table, but this doesn’t differ for our example). One of the tables is partitioned. SELECT *

Read More

Joining partitioned tables can be quite expensive, especially if partitions don’t match or one of the tables is not partitioned. Today I discovered an interesting trick that can be applied to reduce resource usage if joining two tables with matching Primary Indexes and Partitions. The test setup consists of the following two tables (both tables

Read More

Sometimes natural skew can become a huge problem, causing bad join performance. The Partial Duplication & Partial Redistribution (PDPR)  feature on Teradata 14 (and above) helps to reduce this issue, but will not always be able to detect all possible applications (for example, if statistics are not revealing the skewed values). If you are stuck

Read More

Business intelligence tools often create queries with many outer joins. This article proves why outer joins are vulnerable to skewing. We will demonstrate that even if both tables are not skewed, it can be that the plan step by the first join step is biased. Let me carve out the details based on a simple test scenario: CREATE TABLE table1 (pk INTEGER, a INTEGER)

Read More