Functions applied to join columns are always the result of a wrong physical data model.
It is often the case that not even the first normal form was adhered to in the data model.
But apart from causing the characteristic abnormalities and problems, it also hurts query performance. Here is a summary:
What is the effect of functions on the join performance?
Most functions prevent the optimizer from using existing statistics on the join columns when functions are applied to them.
Functions such as SUBSTR(), TRIM(), COALESCE(), CASE statements, and often the LIKE function cause the optimizer to estimate the number of rows based on heuristics.
What should I do if I cannot prevent the use of functions?
There are different techniques to prevent bad join performance. All of them are just workarounds for a wrong data model.
One way is to work with volatile tables by normalizing the columns in a volatile table before the join.
What other disadvantages do functions on join columns have?
Most functions force the optimizer to perform a full table scan.
Are there functions that have no negative impact on performance?
Yes, for example, the optimizer statistic can use histograms if a LIKE function is used in which the placeholder is at the end, like in this statement:
SELECT * FROM Customer WHERE LastName LIKE ‘Wen%’;
But here, too, various methods can be used to improve the implementation plans:
LIKE ‘a%’ versus SUBSTR(column,1,1) = ‘a
In conclusion, I would like to stress that the problems mentioned above are mainly due to a poor data model. So this is the primary starting point. Workarounds are seldom suitable for long-term survival.