Hash Join Archives -

Skewed Joins, Straight Answers: A Neutral Guide for Snowflake/Teradata Teams

04/11/202609/22/2025 by Roland Wenzlofsky

Snowflake’s physical join execution is predominantly hash-based. In practice you’ll observe hash-join variants with two distributions: If you come from Teradata, the intent will feel familiar: both systems aim to co-locate equal keys before matching. This article explains Snowflake’s strategies, maps them to Teradata’s (including dynamic plan fragments), and shows how to recognize and mitigate …

The Pitfalls of Teradata SELECT * Queries

04/11/202605/05/2023 by Roland Wenzlofsky

Introduction In a row-oriented database engine like Teradata, data is organized and stored in units called data blocks. Each data block features a fixed header and accommodates multiple rows. Every row consists of a record header followed by its corresponding columns. When a database retrieves and stores a data block in the cache, it accesses …

Teradata Join Strategies: How to Optimize Join Operations

04/11/202605/03/2023 by Roland Wenzlofsky

Introduction Teradata offers several methods for conducting joins, but all necessitate one prerequisite. The paired table rows must reside on identical AMPs. The chosen method for joining and relocating data is called a join strategy. The preparation for each join method varies. The choice of Teradata join strategy utilized by the Optimizer is determined by …

Mastering Teradata Performance Tuning

04/11/202605/03/2023 by Roland Wenzlofsky

The Art of Teradata Performance Tuning As a Teradata Performance Tuner, technical expertise and experience are essential, occasionally accompanied by fortuitous circumstances. I’ll demonstrate the remarkable outcomes that can be attained by rephrasing a query using this example. Assuming this scenario: One table has a minimal number of rows, while the other is partitioned and …

Exploring the Different Join Methods in Relational Database Systems: Pros, Cons, and Use Cases

04/11/202605/02/2023 by Roland Wenzlofsky

Introduction Relational databases are essential for contemporary data management and analysis. Joining tables, which merges data from two or more tables based on a shared column or condition, is a fundamental operation in these systems. Various join methods exist in relational databases, each with unique advantages and disadvantages. This article examines the different join methods, …

Teradata SQL Tuning: How Query Rewriting Can Reduce Runtime from 40 Minutes to Seconds

04/11/202605/02/2023 by Roland Wenzlofsky

It’s time to share a new Teradata SQL tuning case study that showcases the impressive impact of query rewriting on performance. We are studying the query below that originally took 40 minutes to run. As a SQL tuning specialist, I always prioritize adding missing statistics and refreshing stale ones. I analyzed the SQL statement that …

Understanding Teradata Join Estimation: Heuristics and Importance of Statistics Collection

04/11/202605/02/2023 by Roland Wenzlofsky

What is Teradata Join Estimation? This article demonstrates the functioning of Teradata Join Estimation in the absence of statistics. It presents the heuristics employed to estimate row count and emphasizes the importance of collecting statistics on all join columns. Teradata Join Estimation Heuristics The worst-case scenario involves joining two tables without any collected statistics. We …

Optimizing Teradata Statements Containing Multiple JOINS

04/11/202605/02/2023 by Roland Wenzlofsky

1. Outline This showcase demonstrates optimizing statements with multiple JOINs using Teradata Optimizer’s tuning approach. The approach efficiently determines the best JOIN strategy and implements data redistribution instead of duplication when necessary. Identify and break down underperforming segments to optimize complex logic with multiple joins. Employ an execution plan and monitor query performance and resource …

Choosing the Right Teradata Data Types

04/11/202604/28/2023 by Roland Wenzlofsky

How Do I Select The Appropriate Data Type In Teradata? Converting datatypes incurs substantial costs and demands significant CPU resources when dealing with extensive tables. Incorrect data type selection hinders the execution plan. This article will discuss selecting appropriate data types for optimal performance. Consistency in selecting data types across different tables is crucial, as …