YARN Hadoop – Yet Another Resource Negotiator
Introduction to YARN Hadoop:
YARN Hadoop – Yet Another Resource Negotiator, From the name we can understand that it deals with the resource and its negotiation. The resource manager for the processing part of Hadoop 2.0 is called YARN.
Do you know? In the previous version of Hadoop “Job Tracker” was taking the responsibility of managing the jobs submitted to Hadoop but it had some functional and performance disadvantages so YARN was introduced in Hadoop 2.0 overcome those drawbacks of Job Tracker.
Why YARN was introduced in Hadoop 2 ?
- To understand YARN better we need to discuss why it was introduced in Hadoop-2.
- Job tracker was used in Hadoop 1.0 to do what YARN does now. So the responsibilities of Job Tracker are as follows
Hadoop 1.0 – Job Tracker responsibilities:
- The Job tracker accepts the Job submitted to Hadoop.
- YARN figures out where the data is, using the help of Name node.
- Also calls all Task trackers present in the Data Nodes and assigns them the JOB to run.
- Then monitors all the task trackers (A data node may crash in the middle of a task).
- YARN collects the output data from the task trackers upon successful completion of job then combine it and deliver us the result.
- A single Job tracker handling thousands of Jobs in parallel creates lot of data traffic and slows down the Hadoop performance. It is an architecture problem of Hadoop 1.
- Resource allocation & management was poor and Job tracker was overburdened.
- Job Tracker is limited to only MapReduce data processing platform and Java programs in Hadoop Framework.
- So Job tracker was acting as the biggest bottleneck in Hadoop 1.0
Hadoop 2 – YARN:
- YARN – Which is introduced in Hadoop 2.x,
- Which allows different types of data processing engines like interactive processing, stream processing and graph processing along with batch processing.
- YARN takes the responsibilities of Resource managing and Job Scheduling. So the basic advantage of YARN is that it extends the power of Hadoop to many other technologies not limiting Hadoop to Java and MapReduce.
- In that way other technologies can utilize the advantages of HDFS (the most reliable and popular storage system) in Hadoop. So Hadoop 2 architecture provides a general data processing platform.
- It allows Hadoop to process the other purpose built data processing system like HBase, Tez, Giraph, Spark etc. Same hardware where Hadoop deployed can be used to run several different frameworks.
It’s just all about the YARN Hadoop – Yet Another Resource Negotiator. In the next section, we will learn the Apache YARN Hadoop Architecture.