Map Reduce is a vital feature of Hadoop that numerous database vendors are now integrating into their RDBMS. To illustrate the correlation between a conventional RDBMS and Hadoop, consider the following example.
Illustrating the implementation of logic in Hadoop execution of RDBMS through the MapReduce algorithm, we take the example of a SQL aggregation statement. Our demonstration focuses on the joining of two tables.[highlight] SELECT
t01.clienttype_id = t02.clienttype_id
GROUP BY Segment;[/highlight]
The above SQL statement summarizes the client balances for each client segment (private, business, etc.).
Let’s see how this can be implemented with a MapReduce algorithm.
The aforementioned SQL aggregation statement requires more than one Map-Reduce task to be executed.
Initially, the JOIN step is executed wherein the data is sorted according to clienttype_id. This enables each reducer to emit the corresponding balance for every segment. In simpler terms, the first Map-Reduce task performs the join operation between the Client and ClientType table, without carrying out any grouping.
The second Map-Reduce task sorts the data by segment for consolidation in a single Reducer. This step involves summarization akin to that in a GROUP BY statement.
The previous example illustrated the implementation of RDBMS functionality using Map Reduce. This facilitates a more comprehensive comprehension of Teradata’s capability to delegate joining and aggregation functionality to Map Reduce.