Teradata Optimization with Partial Group by in Joins
2

Teradata Optimization with Partial Group by in Joins

What is Partial Group By?

Joins are very expensive. Before there was PARTIAL GROUP BY, the join was first executed, then the result of the join was aggregated.

The idea behind PARTIAL GROUP BY is simple: Aggregations that can be performed before the join (without changing the semantics of the query) reduce the amount of data that has to be redistributed or duplicated to all AMPs during the join preparation.

Obviously, the less distinct values the GROUP BY columns have, the greater the savings.

A distinction is made between Early GROUP BY and Partial GROUP BY. The only difference is that Early GROUP BY performs all aggregations before the join, Partial GROUP BY performs part of the aggregation before the join and part after the join. However, this is the only difference.

Here is an example of a query that is predestined for PARTIAL GROUP BY:

SELECT table1.key,SUM(table1.fact),SUM(table2.fact)
FROM table1 INNER JOIN table2 ON table1.key = table2.key
GROUP BY 1;

In this case, it makes no difference whether the join occurs first and then the aggregation or vice versa. Depending on the number of rows per table and the different values in the column “fact” of both tables, the performance can be very different.

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

  • Avatar Maria SPB says:

    I tried it and it reduces the runtime by 50%. Btw are you offering also TD performance trainings?

  • >