Understanding the Relationship Between RDBMS and Hadoop: A Map Reduce Example

Roland Wenzlofsky

April 21, 2023

minutes reading time

Map Reduce is a vital feature of Hadoop that numerous database vendors are now integrating into their RDBMS. To illustrate the correlation between a conventional RDBMS and Hadoop, consider the following example.

Illustrating the implementation of logic in Hadoop execution of RDBMS through the MapReduce algorithm, we take the example of a SQL aggregation statement. Our demonstration focuses on the joining of two tables.

[highlight] SELECT
Segment,
SUM(balance)
FROM
Client t01
INNER JOIN
ClientType t02
ON
t01.clienttype_id = t02.clienttype_id
GROUP BY Segment;[/highlight]

The above SQL statement summarizes the client balances for each client segment (private, business, etc.).

Let’s see how this can be implemented with a MapReduce algorithm.

The aforementioned SQL aggregation statement requires more than one Map-Reduce task to be executed.

Initially, the JOIN step is executed wherein the data is sorted according to clienttype_id. This enables each reducer to emit the corresponding balance for every segment. In simpler terms, the first Map-Reduce task performs the join operation between the Client and ClientType table, without carrying out any grouping.

The second Map-Reduce task sorts the data by segment for consolidation in a single Reducer. This step involves summarization akin to that in a GROUP BY statement.

The previous example illustrated the implementation of RDBMS functionality using Map Reduce. This facilitates a more comprehensive comprehension of Teradata’s capability to delegate joining and aggregation functionality to Map Reduce.