March 10

2 comments

Teradata Functions On Join Columns and Performance

By Roland Wenzlofsky

March 10, 2014

joins, sql

Functions applied to join columns are always the result of a bad physical data model.

It is often the case that not even the first normal form was adhered to in the data model.

But apart from causing the usual abnormalities and problems, in Teradata, it also has a negative effect on query performance. Here is a brief summary:

What is the effect of functions on the join performance?

Most functions prevent the optimizer from using existing statistics on the join columns when functions are applied to them.

Functions such as SUBSTR(), TRIM(), COALESCE(), CASE statements, and often the LIKE function cause the optimizer to estimate the number of rows based on heuristics.

What should I do if I cannot prevent the use of functions?

There are different techniques to prevent bad join performance. All of them are just workarounds for a bad data model.
One way is to work with volatile tables by normalizing the columns in a volatile table before the join.

What other disadvantages do functions on join columns have?

Most functions force the optimizer to perform a full table scan.

Are there functions that have no negative impact on performance?

Yes, for example, the optimizer statistic can use histograms if a LIKE function is used in which the placeholder is at the end, like in this statement:

SELECT * FROM Customer WHERE LastName LIKE ‘Wen%';

But here too, various methods can be used to improve the implementation plans:

LIKE ‘a%' versus SUBSTR(column,1,1) = ‘a

In conclusion, I would like to stress once again that the problems mentioned above are mainly due to a poor data model. So this is the primary starting point. Workarounds are seldom suitable for long-term survival.

Roland Wenzlofsky


Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

You might also like

  • Only in the join part. The select list doesn’t matter.

    Still, if you can avoid functions on your WHERE conditions it’s an advantage.

    Since Teradata 14.10 you can apply statistics on functions, still, they are most likely only useful for exactly the one SQL you designed them for. On the other hand, simple column statistics are normally useful in many queries.

    If no statistics can be used, heuristics are applied, which can be completely wrong as they assume a fixed amount of rows being returned.

    Reply

  • Hi Roland,

    Functions on join columns should be avoided only in the join statement? Or throughout the entire SQL statement?
    I’m asking from a statistics usage perspective.
    If my join is clean but I’m doing a function on join column while projecting it in the SELECT part, does the optimizer still not use column’s statistics?

    Regards,
    Virendra

    Reply

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    Never miss a good story!

     Subscribe to our newsletter to keep up with the latest trends!

    >