September 23


Teradata Optimization with Partial Group by in Joins

By Roland Wenzlofsky

September 23, 2019

group by, tuning

What is Partial Group By?

Joins are very expensive. Before there was PARTIAL GROUP BY, the join was first executed, then the result of the join was aggregated.

The idea behind PARTIAL GROUP BY is simple: Aggregations that can be performed before the join (without changing the semantics of the query) reduce the amount of data that has to be redistributed or duplicated to all AMPs during the join preparation.

Obviously, the less distinct values the GROUP BY columns have, the greater the savings.

A distinction is made between Early GROUP BY and Partial GROUP BY. The only difference is that Early GROUP BY performs all aggregations before the join, Partial GROUP BY performs part of the aggregation before the join and part after the join. However, this is the only difference.

Here is an example of a query that is predestined for PARTIAL GROUP BY:

SELECT table1.key,SUM(table1.fact),SUM(table2.fact)
FROM table1 INNER JOIN table2 ON table1.key = table2.key

In this case, it makes no difference whether the join occurs first and then the aggregation or vice versa. Depending on the number of rows per table and the different values in the column “fact” of both tables, the performance can be very different.

Roland Wenzlofsky

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

You might also like

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    Never miss a good story!

     Subscribe to our newsletter to keep up with the latest trends!