Teradata Optimization with Partial Group by in Joins

Roland Wenzlofsky

September 23, 2019

minutes reading time


What is Partial Group By?

Joins are very expensive. Before there was PARTIAL GROUP BY, the join was first executed, then the result of the join was aggregated.

The idea behind PARTIAL GROUP BY is simple: Aggregations that can be performed before the join (without changing the semantics of the query) reduce the amount of data that has to be redistributed or duplicated to all AMPs during the join preparation.

Obviously, the less distinct values the GROUP BY columns have, the greater the savings.

A distinction is made between Early GROUP BY and Partial GROUP BY. The only difference is that Early GROUP BY performs all aggregations before the join, Partial GROUP BY performs part of the aggregation before the join and part after the join. However, this is the only difference.

Here is an example of a query that is predestined for PARTIAL GROUP BY:

SELECT table1.key,SUM(table1.fact),SUM(table2.fact)
FROM table1 INNER JOIN table2 ON table1.key = table2.key
GROUP BY 1;

In this case, it makes no difference whether the join occurs first and then the aggregation or vice versa. Depending on the number of rows per table and the different values in the column “fact” of both tables, the performance can be very different.

  • Avatar
    Maria SPB says:

    I tried it and it reduces the runtime by 50%. Btw are you offering also TD performance trainings?

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >