Functions on JOIN columns in Teradata

There are two major problems, when using functions on join columns.

  1. The Teradata Optimizer cannot make use of statistics; String manipulation functions like SUBSTR(), TRIM(), the LIKE function, COALESCE() and CASE WHEN statements force the optimizer to use heuristics to make its estimations.In case you cannot avoid using a function,  try at least different strategies, and figure out which function allows for the best execution planNevertheless, using functions in your joins, means that you have a bad physical data model. Either it’s technically outdated, inaccurate on the logical data model, or it has to serve two masters.

    Ask yourself: Why there is no separate column that stores the essential information as you need it?

  2. Using functions on your join columns forces the optimizer to do a full table scan, slowing down your queries

Not each function is equal in its negative impact on performance. I experienced different results for applications of LIKE ‘a%’ versus SUBSTR(column,1,1) = ‘a’ .
You have to experiment a little bit with the various alternatives and chose the one that impairs performance the least. The best approach, though,  is to avoid the usage of functions on any join column by design.

Roland Wenzlofsky

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Avatar Virendra says:

    Hi Roland,

    Functions on join columns should be avoided only in join statement? Or throughout the entire SQL statement?
    I’m asking from statistics usage perspective.
    If my join is clean but I’m doing a function on join column while projecting it in SELECT part, does the optimizer still not use column’s statistics?


  • Only in the join part. The select list doesn’t matter.

    Still, if you can avoid functions on your WHERE conditions it’s an advantage.

    Since Teradata 14.10 you can apply statistics on functions, still they are most likely only useful for exactly the one SQL you designed them for. On the other hand, simple column statistics are normally useful in many queries.

    If no statistics can be used, heuristics are applied, which can be completly wrong as they assume a fix amount of rows being returned.

  • >