April 15


Teradata Map Reduce – SQL JOIN and GROUP BY

By Roland Wenzlofsky

April 15, 2014

group by, hadoop, map reduce

Map Reduce is one of the Hadoop core functionalities, which most database vendors are adding to their RDBMS. The following example should help you to understand the relation between a traditional RDBMS and Hadoop.

As an example, we use a SQL aggregation statement, which is joining two tables together; Based on this example, we will show how logic can be pushed down to the “Hadoop implementation” of the RDBMS, which is being handled by the Map-Reduce algorithm:

[highlight] SELECT
Client t01
ClientType t02
t01.clienttype_id = t02.clienttype_id
GROUP BY Segment;[/highlight]

Above SQL statement is summing up the client balances for each client segment (private, business, etc.).

Let's see how this can be implemented with a Map-Reduce algorithm.

Our example SQL aggregation statement from above cannot be covered by only one Map-Reduce task.

In a first step, the JOIN step is done. The data is sorted by clienttype_id, allowing each reducer to emit the balance for each segment. In other words, the first Map-Reduce task does the join of the Client and ClientType table, but is not doing the grouping:

join task
join task

The second Map-Reduce task sorts the data by segment so that all rows for a particular segment are being emitted to the same Reducer. The reducer does the summarizing needed in the GROUP BY statement.


Above example showed, how RDBMS functionality can be designed in Map Reduce. It helps to understand in a better way, what we are relating to, when analyzing the ability of an RDBMS like Teradata, to push down joining and aggregation functionality to Map Reduce.

Roland Wenzlofsky

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

You might also like

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Never miss a good story!

 Subscribe to our newsletter to keep up with the latest trends!