fbpx

Map Reduce is one of the Hadoop core functionalities, which most database vendors are adding to their RDBMS. The following example should help you to understand the relation between a traditional RDBMS and Hadoop.

As an example, we use a SQL aggregation statement, which is joining two tables together; Based on this example, we will show how logic can be pushed down to the “Hadoop implementation” of the RDBMS, which is being handled by the Map-Reduce algorithm:

[highlight] SELECT
Segment,
SUM(balance)
FROM
Client t01
INNER JOIN
ClientType t02
ON
t01.clienttype_id = t02.clienttype_id
GROUP BY Segment;[/highlight]

Above SQL statement is summing up the client balances for each client segment (private, business, etc.).

Let’s see how this can be implemented with a Map-Reduce algorithm.

Our example SQL aggregation statement from above cannot be covered by only one Map-Reduce task.

In a first step, the JOIN step is done. The data is sorted by clienttype_id, allowing each reducer to emit the balance for each segment. In other words, the first Map-Reduce task does the join of the Client and ClientType table, but is not doing the grouping:

join task
join task

The second Map-Reduce task sorts the data by segment so that all rows for a particular segment are being emitted to the same Reducer. The reducer does the summarizing needed in the GROUP BY statement.

grouping

Above example showed, how RDBMS functionality can be designed in Map Reduce. It helps to understand in a better way, what we are relating to, when analyzing the ability of an RDBMS like Teradata, to push down joining and aggregation functionality to Map Reduce.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
Buy the Book Teradata Query Performance Tuning

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>