Hadoop 1x Vs Hadoop 2x and Hadoop 2x Vs Hadoop 3x

Let’s study about Hadoop 1x Vs Hadoop 2x and Hadoop 2x Vs Hadoop 3x,

1. Hadoop 1x Vs Hadoop 2x

Hadoop1x v/s hadoop 2x

Sl. NoHadoop1Hadoop2
1Hadoop 1 framework supports only MapReduce processing (MR) tool and does not support any other non-MapReduce tools.Hadoop 2 along with MR it supports other processing tools like Spark, Giraph, HBase & MPI etc.
2There is no separate setup to do the resource management. MR does both data processing and cluster resource management.There is a separate entity called YARN (Yet Another Resource Negotiator) which does the cluster resource management and processing done using different processing tools.
3.The scalability of Nodes is limited i.e 4000 nodes per cluster.It has a higher scalability limit i.e up to 10000 nodes per cluster.
4The entire namespace of the cluster is managed by just a single NameNode.Here multiple namespaces are managed by multiple NameNode servers.
5Works on the concept of Slots i.e machine space which can either run a Map task or a Reduce task only.Works on the concept of Containers i.e machine spaces which can run multiple MapReduce or generic tasks in parallel.
6It cannot serve as a platform for event processing, streaming and real-time operations.It can serve as a platform for processing wide variety of data analytics i.e event processing, streaming and real-time operations.
7It has only one NameNode managing the metadata of the cluster.It has a standby NameNode to overcome the SPOF i.e in case of NameNode failure helps in automatic recovery.
8MR API is developed for Hadoop1 is compatible with Hadoop1 i.e executed without ant additional files.Here Hadoop1X is not compatible with Hadoop2X i.e requires additional files.
9In hadoop1 NameNode failure affects the stack.It handle NameNode failure with the help of The Hadoop stacks like Hive, Pig, and HBase etc.
10Hadoop1 does not support Microsoft windows.In Hadoop2 added support for Microsoft windows.

2. Hadoop 2x Vs Hadoop 3x

 

Hadoop2x v/s hadoop3x

Sl.NoHadoop2Hadoop 3
1Lowest Java version supported is JAVA 7Lowest Java version supported is JAVA 8
2Apachee 2.0, Open SourceApachee 2.0, Open Source
3Fault tolerance is handled by its high availability using replication factor (wastage of space)Fault tolerance in Hadoop3 can be handled by Erasure coding which takes less space compared to Hadoop2.
4HDFS Balancer is used for data balancing.Intra-data node balancer is used for data balancing.
5Uses 3X replication factor for data Storage.It supports erasure encoding in HDFS.
6Here HDFS occupies a 200% overhead storage space.Here HDFS occupies only 50% overhead storage space.
7Here 3, 6 blocks of data will occupy the space of 18 blocks due to replication factorHere 6 blocks of data will occupy 9 blocks i.e 6 blocks for actual data and 3 blocks for parity. due to parity concept
8It has scalability issues due to old YARN timeline services.It has no scalability & reliability issues due to improved YARN timeline services.
9It has a lower scalability limit i.e Scalable up to 10000 nodes per cluster.It has higher and better scalability limit i.e Scalable beyond 10000 nodes per cluster.
10Linux ephemeral port range is assigned for some default ports.These ports have been moved out of the ephemeral range.
11HDFS and FTP file system is supported like Amazon S3 file system and Windows Azure Storage Blobs (WASB) file system.It also supports Microsoft Azure Data Lake file system.
12HDFS & DataNodes and YARN in Hadoop 2 along with MR it supports other processing tools like Spark, Giraph, HBase & MPI etc.HDFS & DataNodes and YARN in Hadoop 3 also supports other processing tools like Spark, Giraph, HBase & MPI etc.
13It uses YARN for cluster resource managementIt also uses YARN for cluster resource management
14Here multiple namespaces are managed by multiple NameNode servers.Here multiple namespaces are managed by multiple NameNode servers.
15Works on the concept of Containers i.e machine spaces which can run multiple MapReduce or generic tasks in parallel.It also process data using containers.
16It has a standby NameNode to overcome the SPOF, support automatic recovery. It has a standby NameNode to overcome the SPOF, support for automatic recovery.
17MR API compatible with hadoop1 can execute on Hadoop 2 but for a program written in Hadoop1x to be executed in Hadoop2x, it requires additional files.MR API compatible with hadoop1 can execute also on Hadoop 3 but for a program written in Hadoop1x to be executed in Hadoop3x, it requires additional files.

 

“That’s all about the Hadoop 1x Vs Hadoop 2x and Hadoop 2x Vs Hadoop 3x”