Teradata Functions On Join Columns and Performance
Functions applied to join columns are always the result of a bad physical data model.
It is often the case that not even the first normal form was adhered to in the data model.
But apart from causing the usual abnormalities and problems, in Teradata, it also has a negative effect on query performance. Here is a brief summary:
Most functions prevent the optimizer from using existing statistics on the join columns when functions are applied to them.
Functions such as SUBSTR(), TRIM(), COALESCE(), CASE statements, and often the LIKE function cause the optimizer to estimate the number of rows based on heuristics.
There are different techniques to prevent bad join performance. All of them are just workarounds for a bad data model.
One way is to work with volatile tables by normalizing the columns in a volatile table before the join.
Most functions force the optimizer to perform a full table scan.
Yes, for example, the optimizer statistic can use histograms if a LIKE function is used in which the placeholder is at the end, like in this statement:
SELECT * FROM Customer WHERE LastName LIKE ‘Wen%';
But here too, various methods can be used to improve the implementation plans:
LIKE ‘a%' versus SUBSTR(column,1,1) = ‘a
In conclusion, I would like to stress once again that the problems mentioned above are mainly due to a poor data model. So this is the primary starting point. Workarounds are seldom suitable for long-term survival.