Map Reduce is a vital feature of Hadoop that numerous database vendors are now integrating into their RDBMS. To illustrate the correlation between a conventional RDBMS and Hadoop, consider the following example.
To illustrate how RDBMS logic is executed in Hadoop via the MapReduce algorithm, we use the example of a SQL aggregation statement involving the joining of two tables.
[highlight] SELECT
Segment,
SUM(balance)
FROM
Client t01
INNER JOIN
ClientType t02
ON
t01.clienttype_id = t02.clienttype_id
GROUP BY Segment;[/highlight]
The above SQL statement summarizes the client balances for each client segment (private, business, etc.).
Let’s see how this can be implemented with a MapReduce algorithm.
The aforementioned SQL aggregation statement requires more than one Map-Reduce task to be executed.
Initially, the JOIN step is executed wherein the data is sorted according to clienttype_id. This enables each reducer to emit the corresponding balance for every segment. In simpler terms, the first Map-Reduce task performs the join operation between the Client and ClientType table, without carrying out any grouping.
The second Map-Reduce task sorts the data by segment for consolidation in a single Reducer. This step involves summarization akin to that in a GROUP BY statement.
The previous example illustrated how RDBMS functionality can be implemented using Map Reduce. This helps build a clearer understanding of Teradata’s capability to delegate joining and aggregation functionality to Map Reduce.
Related Services
🏗️ Planning a Data Platform Migration?
Architecture-first approach: we design before a single line of code is written. Zero data loss across every migration delivered.
Our Migration Services →

