What is Partial Group By?
Joins are very expensive. Before there was PARTIAL GROUP BY, the join was first executed, then the result of the join was aggregated.
The idea behind PARTIAL GROUP BY is simple: Aggregations that can be performed before the join (without changing the semantics of the query) reduce the amount of data that has to be redistributed or duplicated to all AMPs during the join preparation.
Obviously, the less distinct values the GROUP BY columns have, the greater the savings.
A distinction is made between Early GROUP BY and Partial GROUP BY. The only difference is that Early GROUP BY performs all aggregations before the join, Partial GROUP BY performs part of the aggregation before the join and part after the join. However, this is the only difference.
Here is an example of a query that is predestined for PARTIAL GROUP BY:
FROM table1 INNER JOIN table2 ON table1.key = table2.key
GROUP BY 1;
In this case, it makes no difference whether the join occurs first and then the aggregation or vice versa. Depending on the number of rows per table and the different values in the column “fact” of both tables, the performance can be very different.
I tried it and it reduces the runtime by 50%. Btw are you offering also TD performance trainings?
Yes, just write me an email to [email protected]