Negative Impact of Applying Functions to Join Columns in Teradata Joins: Performance Implications and Solutions

Roland Wenzlofsky

May 5, 2023

minutes reading time

Introduction

Applying column joins in Teradata using functions may indicate a flawed data model, potentially violating the first normal form.

SELECT A.column1, A.column2, B.column1, B.column2
FROM tableA A
JOIN tableB B
ON SUBSTRING(A.join_column, 1, 3) = B.join_column;

This article investigates the adverse impact on query performance that abnormalities can cause in Teradata join scenarios and suggests potential solutions to mitigate such consequences.

What is the effect of functions on the join performance?

Applying functions such as SUBSTR(), TRIM(), COALESCE(), CASE statements, and even the LIKE function to join columns can hinder the optimizer’s ability to use existing statistics. Consequently, the optimizer must estimate the number of rows based on heuristics, which may result in inefficient query execution.

What should I do if I cannot prevent the use of functions?

While fixing the underlying data model is the optimal solution for Teradata joins, alternative methods can help reduce the negative impact of using functions on join columns. One approach is creating volatile tables and normalizing columns before joining the operation. However, remember that these techniques are workarounds for an inadequate data model.

What other disadvantages do functions on join columns have?

Most functions compel the optimizer to perform a full table scan, which can be resource-intensive and time-consuming.

Are there functions that have no negative impact on performance?

Certain functions, such as the LIKE function with the ‘%’ pattern matching parameter at the end, allow the optimizer to use histograms for statistics.
For example:
SELECT * FROM Customer WHERE LastName LIKE ‘Wen%’;
Nevertheless, various methods can be employed to enhance implementation plans, such as using “LIKE ‘a%'” instead of “SUBSTR(column,1,1) = ‘a'”.

Conclusion

An inadequate data model mainly causes the problems addressed in this article. Attention to the data model should be the top priority since temporary solutions are seldom effective in achieving long-term success. Allocating resources for data model enhancement will advance performance and lessen complexities in Teradata joins.

Take a brief survey and receive the book for free Get instant access

Share0

Tweet0

Share0

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

Hi Roland,

Functions on join columns should be avoided only in the join statement? Or throughout the entire SQL statement?
I’m asking from a statistics usage perspective.
If my join is clean but I’m doing a function on join column while projecting it in the SELECT part, does the optimizer still not use column’s statistics?

Regards,
Virendra

Only in the join part. The select list doesn’t matter.

Still, if you can avoid functions on your WHERE conditions it’s an advantage.

Since Teradata 14.10 you can apply statistics on functions, still, they are most likely only useful for exactly the one SQL you designed them for. On the other hand, simple column statistics are normally useful in many queries.

If no statistics can be used, heuristics are applied, which can be completely wrong as they assume a fixed amount of rows being returned.

Negative Impact of Applying Functions to Join Columns in Teradata Joins: Performance Implications and Solutions

Introduction

What is the effect of functions on the join performance?

What should I do if I cannot prevent the use of functions?

What other disadvantages do functions on join columns have?

Are there functions that have no negative impact on performance?

Conclusion

Roland Wenzlofsky

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes