Introduction

Applying column joins in Teradata using functions may indicate a flawed data model, potentially violating the first normal form.

SELECT A.column1, A.column2, B.column1, B.column2
FROM tableA A
JOIN tableB B
ON SUBSTRING(A.join_column, 1, 3) = B.join_column;

This article investigates the adverse impact on query performance that abnormalities can cause in Teradata join scenarios and suggests potential solutions to mitigate such consequences.

What is the effect of functions on the join performance?

Applying functions such as SUBSTR(), TRIM(), COALESCE(), CASE statements, and even the LIKE function to join columns can hinder the optimizer’s ability to use existing statistics. Consequently, the optimizer must estimate the number of rows based on heuristics, which may result in inefficient query execution.

What should I do if I cannot prevent the use of functions?

While fixing the underlying data model is the optimal solution for Teradata joins, alternative methods can help reduce the negative impact of using functions on join columns. One approach is creating volatile tables and normalizing columns before joining the operation. However, remember that these techniques are workarounds for an inadequate data model.

What other disadvantages do functions on join columns have?

Most functions compel the optimizer to perform a full table scan, which can be resource-intensive and time-consuming.

Are there functions that have no negative impact on performance?

Certain functions, such as the LIKE function with the ‘%’ pattern matching parameter at the end, allow the optimizer to use histograms for statistics.
For example:
SELECT * FROM Customer WHERE LastName LIKE ‘Wen%’;
Nevertheless, various methods can be employed to enhance implementation plans, such as using “LIKE ‘a%'” instead of “SUBSTR(column,1,1) = ‘a'”.

Conclusion

An inadequate data model mainly causes the problems addressed in this article. Attention to the data model should be the top priority since temporary solutions are seldom effective in achieving long-term success. Allocating resources for data model enhancement will advance performance and lessen complexities in Teradata joins.

  • Hi Roland,

    Functions on join columns should be avoided only in the join statement? Or throughout the entire SQL statement?
    I’m asking from a statistics usage perspective.
    If my join is clean but I’m doing a function on join column while projecting it in the SELECT part, does the optimizer still not use column’s statistics?

    Regards,
    Virendra

  • Only in the join part. The select list doesn’t matter.

    Still, if you can avoid functions on your WHERE conditions it’s an advantage.

    Since Teradata 14.10 you can apply statistics on functions, still, they are most likely only useful for exactly the one SQL you designed them for. On the other hand, simple column statistics are normally useful in many queries.

    If no statistics can be used, heuristics are applied, which can be completely wrong as they assume a fixed amount of rows being returned.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >