Below you can see an illustration of how real-world map-reduce implementations are designed (for example, Hadoop):
The input files are located on a distributed file system. In the case of Hadoop and many others, this is the HDFS. Google calls it the Google File System (GFS).
Worker processes can take over either mapper tasks or reducer tasks.
The mapper processed access pieces of the data located on the HDFS, apply the mapping function, and write the result to their local disks.
Each time a mapper process is ready with a particular reduce key (as you remember, the map phase’s output is key/value pairs), it informs the master process that this key is ready.
As you can see from the below illustration, the master process plays an important role. It assigns the keys which are finished to different reducer processes.
Please notice that each process handling a task can take over either a mapper or a reduce task. They are not specialized.
When the master process informs a reducer process that a particular reduce key is available, it is simultaneously receiving the information from which mapper (and its local disk) the data can be picked up.
Be aware that a reducer task needs to know when it can start to work on a particular reduce key, i.e., all mapper processes have delivered data for this specific key.
Again, the master process plays an essential role as it keeps records and knows if all mapper tasks delivered data for the particular key (which is a prerequisite for the reduce task to start its work on this key).
One significant advantage of the Map-Reduce paradigm and its real-world implementations against traditional database systems is fault tolerance.
Consider a traditional database system like Teradata.
Although many features are built into a Teradata System to increase fault tolerance, it is hard to imagine a Teradata System scaling to hundreds of thousands of processes.
Statistical probability ensures that failure is almost guaranteed in the long run.
Let’s do a simple example:
If the probability for each process to fail is only 1%, the likelihood for one process out of hundreds of thousands to fail is around 100%. This means failure is guaranteed with such a huge system we may see more and more in the Big Data arena.
Fault tolerance is a huge topic for Map Reduce implementations.
In principle, the same statistical rules, as mentioned above, apply for Map Reduce. Imagine a Map-Reduce task spawning hundreds of thousands of mapper and reducer processes. Such a task can run for hours or even days on really huge amounts of data applying complex calculations like pattern recognition.
A master process always keeps communicating regularly with all other processes (mapper or reducer) and records the key ranges that have been successfully processed by each mapper or reducer task.
If a mapper fails, the master task reassigns the complete key range to another mapper, which accesses the HDFS and repeats this key range.
In case a reducer fails, the same logic from the above applies, and a new reducer is assigned to this task. Despite, already written keys to the output do not have to be reprocessed.
As you can see, the master process keeps track of all activities in the Map-Reduce task and handles failures gracefully, even for systems with hundreds of thousands of processes.
Further, even not real failures, the master process detects slow mappers or reducers and takes away the key range they are working upon, assigning it to a new process.
One might assume that the master process is a single point of failure. This is indeed the case, and if the master process fails, the complete Map-Reduce tasks fail.
But again, statistical probabilities have to be considered:
Even if the chance that any of the thousands of processes fail is almost 100%, the risk that exactly this particular master process fails is minimal. Precisely this property makes Map Reduce implementations very fault-tolerant and more suitable for enormous scale tasks than traditional database implementations.