Hadoop MapReduce Tutorials

In the part of Hadoop MapReduce Tutorials we are going to understand:

  • What is MapReduce?
  • What are its features?  
  • Why MapReduce?
  • What is Big Data Map Reduce?

Introduction to MapReduce

In introduction to Hadoop MapReduce Tutorials, Its a processing layer of Hadoop. As we can understand for any data to be processed in a machine first it has to be stored then processed to provide an output.  So the,

  • data storage part of Hadoop is taken care by HDFS and
  • the processing part is taken care by MapReduce.

Hadoop MapReduce is one of the world’s best data processing frameworks.

The two major tasks performed by the MapReduce framework.

  1. Map
  2. Reduce

MapReduce is capable of dealing with large volume of data. It works almost with the same principle of HDFS. MapReduce is a programming model which can divide a work into a set of independent tasks and by doing this way it can process large volume of data in parallel.

So the Master-Slave concept works in MapReduce too, the Complete Job submitted will be sent to the Master which in turn divides the job into multiple small tasks and send it to the Slaves. The Master schedules the task then monitors them and re-executes the failed tasks. So the individual outputs from the small tasks are gathered together to process and get the final output.

More About MapReduce:

In Apache Hadoop Framework, the compute nodes and the storage nodes area same that means the data nodes which acted as a slave for data storage will also handle the data processing. So tasks will schedule on that nodes where the data is present.

  • Easy scalability is one of the biggest advantages of MapReduce.
  • An application once written in MapReduce to run in hundred machines can easily scale up to run over hundreds and thousands of machines in a cluster by merely changing the configuration.
  • MapReduce programs are written for processing the lists of data using functional programming constructs and specified idioms. Inputs are provided as a list and the processed output is also a list.
  • MapReduce acts as the heart of Hadoop. It is more efficient data processing tool due to its parallel processing capability. Many small machines can be used to process a large data which cannot be processed by a large machine.
  • MapReduce contains a single master which is a JobTacker. Master takes the responsibility of scheduling the tasks to the slaves, monitoring and then re-executing the failed tasks.

Overview of Apache Hadoop MapReduce Architecture:

Let’s try to understand the basic of Hadoop MapReduce Architecture in Hadoop MapReduce Tutorials.

Hadoop Map reduces works on the principle of sending the processing task to where the data already resides.

  • It consist of two major stages Map & Reduce
  • Having phases of Shuffle and Sort in between MapReduce. 

So Input data sent to MAP will be processed into divided into multiple chunks of data and in Reduce stage it process the set of data which comes out from the Mapper and produce output stored in the HDFS.

Hadoop MapReduce Tutorials

Apache Hadoop MapReduce Architecture

Simple Word Count MapReduce Example

Below Diagram Summarize the working of MapReduce in Hadoop.

Mapreduce Example


Hope you will able to get the Hadoop MapReduce Architecture in Hadoop MapReduce Tutorials with simple example. We will also discuss everything in detail later.