What is Partial Group By?
Joins are costly. Before the introduction of PARTIAL GROUP BY, the join would be performed first, and then the aggregated result would follow.
PARTIAL GROUP BY reduces the amount of data that must be redistributed or duplicated to all AMPs during join preparation by performing aggregations before the join, without altering the query’s semantics.
The savings increase as the number of unique values in the GROUP BY columns decreases.
Early GROUP BY and Partial GROUP BY are distinguishable solely based on their approach to aggregations in relation to the join process. While Early GROUP BY executes all aggregations before joining, Partial GROUP BY conducts a portion of the aggregation before and after the join. Nonetheless, this is the only point of differentiation.
This query is optimized for PARTIAL GROUP BY:
SELECT table1.key,SUM(table1.fact),SUM(table2.fact)
FROM table1 INNER JOIN table2 ON table1.key = table2.key
GROUP BY 1;
The order of the join and aggregation is insignificant in this scenario. The performance may vary based on the number of rows and distinct values in the “fact” column of each table.
I tried it and it reduces the runtime by 50%. Btw are you offering also TD performance trainings?
Yes, just write me an email to [email protected]