The Vulnerability of Outer Joins in Business Intelligence Tools: A Simple Test Scenario

Business intelligence tools frequently utilize queries with multiple outer joins. This article presents evidence supporting the vulnerability of outer joins to skewing. We will illustrate that, regardless of whether both tables are skewed, there exists a possibility that the plan step in the initial joins may be biased.

Let me outline the details based on a simple test scenario:

CREATE TABLE table1 (pk INTEGER, a INTEGER) PRIMARY INDEX (pk);
CREATE TABLE table2 (pk INTEGER, b INTEGER, c INTEGER) PRIMARY INDEX (pk);
CREATE TABLE table3 (pk INTEGER, d INTEGER, e INTEGER) PRIMARY INDEX (pk);

SELECT a,b,c,d,e FROM table1 t01 LEFT JOIN table2 t02 ON t01.a = t02.b LEFT JOIN table3 t03 ON t02.c = t03.d;

SELECT a,b,c,d,e FROM table1 t01 LEFT JOIN table2

The Optimizer can generate an execution plan for the aforementioned setup (tables and query) that consists of the following steps:

1. Redistribute table1 on column a, and redistribute table2 on column b (in parallel).

2. Merge join (outer join) the resulting spools (from step 1). The columns of the resulting spool will be padded with NULL values for columns not matching rows.

2. Merge join (outer join) the resulting spools

3. Let’s assume that the resulting spool (i.e., the spool which holds the joined result of table1 and table2) is still not skewed. We will call this spool s(table1,table2) for simplicity.

4. The result spool s(table1,table2) is redistributed on column c.

5. Table3 is redistributed on column d.

The skewing issues will arise as the join of table1 and table2 may result in numerous non-matching rows, causing multiple NULL values to populate column c of the s(table1,table2).

The rowhash value of a NULL value is deterministic and consistent – every NULL value will always hash to the same AMP.

If only a few rows match during the join of table1 and table2, redistributing s(table1,table2) based on column c (necessary for joining with table3) will relocate any unmatched rows to the AMP holding NULL values.

If only a few rows match during the join of

Initially, our situation was ideal with three unskewed tables. However, executing the second outer join resulted in a skewed spool.

Business intelligence tools often generate queries containing many outer joins, which can significantly increase the likelihood of skewing.

Teradata 14.10 employs techniques such as PDPR to prevent join skewing.

Statistics are essential to ensure the availability of PDPR as the Optimizer may not recognize all applicable situations.

When manually coding queries, only use outer joins when business logic dictates.

Avoid using outer joins unnecessarily to prevent losing rows, which has become a prevalent practice.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

📊 Data Platform Migration Survey

Help us map where the industry is heading. Results are public — see what others chose.

1. What is your current data platform?

2. Where are you migrating to (or evaluating)?

Migrating FROM
Migrating TO

Thanks for voting! Share this with your network.

Follow me on LinkedIn for daily insights on data warehousing and platform migrations.

Stay Ahead in Data Warehousing

Get expert insights on Teradata, Snowflake, BigQuery, Databricks, Microsoft Fabric, and modern data architecture — delivered to your inbox.

Leave a Comment

DWHPro

Expert network for enterprise data platforms. Senior consultants, project teams built for your challenge — across Teradata, Snowflake, Databricks, and more.

📍Vienna, Austria & Jacksonville, Florida

Quick Links
Services Team Teradata Book Blog Contact Us
Connect
LinkedIn → [email protected]
Newsletter

Join 4,000+ data professionals.
Weekly insights on Teradata, Snowflake & data architecture.