fbpx

Functions applied to join columns are always the result of a bad physical data model.

It is often the case that not even the first normal form was adhered to in the data model.

But apart from causing the usual abnormalities and problems, in Teradata, it also hurts query performance. Here is a summary:

What is the effect of functions on the join performance?

Most functions prevent the optimizer from using existing statistics on the join columns when functions are applied to them.

Functions such as SUBSTR(), TRIM(), COALESCE(), CASE statements, and often the LIKE function cause the optimizer to estimate the number of rows based on heuristics.

What should I do if I cannot prevent the use of functions?

There are different techniques to prevent bad join performance. All of them are just workarounds for a bad data model.
One way is to work with volatile tables by normalizing the columns in a volatile table before the join.

What other disadvantages do functions on join columns have?

Most functions force the optimizer to perform a full table scan.

Are there functions that have no negative impact on performance?

Yes, for example, the optimizer statistic can use histograms if a LIKE function is used in which the placeholder is at the end, like in this statement:

SELECT * FROM Customer WHERE LastName LIKE ‘Wen%’;

But here too, various methods can be used to improve the implementation plans:

LIKE ‘a%’ versus SUBSTR(column,1,1) = ‘a

In conclusion, I would like to stress that the problems mentioned above are mainly due to a poor data model. So this is the primary starting point. Workarounds are seldom suitable for long-term survival.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
Buy the Book Teradata Query Performance Tuning

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Only in the join part. The select list doesn’t matter.

    Still, if you can avoid functions on your WHERE conditions it’s an advantage.

    Since Teradata 14.10 you can apply statistics on functions, still, they are most likely only useful for exactly the one SQL you designed them for. On the other hand, simple column statistics are normally useful in many queries.

    If no statistics can be used, heuristics are applied, which can be completely wrong as they assume a fixed amount of rows being returned.

  • Hi Roland,

    Functions on join columns should be avoided only in the join statement? Or throughout the entire SQL statement?
    I’m asking from a statistics usage perspective.
    If my join is clean but I’m doing a function on join column while projecting it in the SELECT part, does the optimizer still not use column’s statistics?

    Regards,
    Virendra

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >