Teradata Map Reduce – SQL JOIN and GROUP BY

0
185

 

teradata map reduce

Map Reduce is one of the Hadoop core functionalities, which most database vendors are adding to their RDBMS. The following example should help you to understand the relation between a traditional RDBMS and Hadoop.

As an example, we use a SQL aggregation statement, which is joining two tables together; Based on this example, we will show how logic can be pushed down to the “Hadoop implementation” of the RDBMS, which is being handled by the Map-Reduce algorithm:

[highlight] SELECT
Segment,
SUM(balance)
FROM
Client t01
INNER JOIN
ClientType t02
ON
t01.clienttype_id = t02.clienttype_id
GROUP BY Segment;[/highlight]

Above SQL statement is summing up the client balances for each client segment (private, business, etc.).

Let’s see how this can be implemented with a Map-Reduce algorithm.

Our example SQL aggregation statement from above cannot be covered by only one Map-Reduce task.

In a first step, the JOIN step is done. The data is sorted by clienttype_id, allowing each reducer to emit the balance for each segment. In other words, the first Map-Reduce task does the join of the Client and ClientType table, but is not doing the grouping:

 

join task
join task

 

The second Map-Reduce task sorts the data by segment so that all rows for a particular segment are being emitted to the same Reducer. The reducer does the summarizing needed in the GROUP BY statement.

grouping

Above example showed, how RDBMS functionality can be designed in Map Reduce. It helps to understand in a better way, what we are relating to, when analyzing the ability of an RDBMS like Teradata, to push down joining and aggregation functionality to Map Reduce.

Our Reader Score
[Total: 3    Average: 3.7/5]
Teradata Map Reduce – SQL JOIN and GROUP BY written by Roland Wenzlofsky on April 15, 2014 average rating 3.7/5 - 3 user ratings

LEAVE A REPLY

Please enter your comment!
Please enter your name here