What is Apache Hadoop Introduction

Learn Apache Hadoop from basic

Apache Hadoop Introduction, its very important as its create the base to learn the apache Hadoop.

Do you know? 1. Apache Hadoop is very easy to learn and anyone can easily switch profile to big data domain. 2. Why Apache Hadoop comes? 3. What is the need of Apache Hadoop? 4. Problems solved by using Apache Hadoop? 5. Big Data engineers are getting minimum 2x salary as compare to others engineers.

Lot of engineers have question for big data or hadoop, They always search for ” difference between Big Data and Apache Hadoop”. Don’t get confuse, Big Data is just the generic term of very large amount of data and Apache Hadoop is the tool which can process/store the very large amount of data.

What is Apache Hadoop

 As you know that, Apache Hadoop open source tool that can provide the solution for processing the large amount of data with efficient manner. Means if you have the data in GBs n TBs and you want to extract the information from that very large log file that can be possible using Apache Hadoop.


Real life Big Data example using Hadoop: Let us suppose you have a 50GB of log file of your application and you need that, “How many time the error was occurred in one day”, Using Hadoop you can write the program and get the number of errors that was occurred in one day.

Can you open 50GB of log file via notepad or any other tool? I think its not very easy to open that kind of file and its much more difficult to read that file.
Do you know? That kind of file can be processed using Apache Hadoop within minutes using normal computers. “Be patience, how it can be done, that will be show to you with practical work”

Key Points for Hadoop Below key points will give you the overview of ‘Apache Hadoop Introduction’

  • Apache Hadoop is an open source technology
  • Java and much more technologies can be used to write the application for Apache Hadoop.
  • Apache Hadoop was developed using Java-based programming framework that supports the processing of large data sets in a distributed computing environment. That environment will be discussed in details later.
  • Apache Hadoop use GFS(Google File System) technology.

Basic Architecture for Apache Hadoop Introduction: 

Just have a overview as part of apache Hadoop introduction. Hadoop mainly consist of two component, one is for data storage is call HDFS and another one for data processing which is called MapReduce. Apache Hadoop Introduction HDFS: It has a distributed file system, called the Hadoop Distributed File System (HDFS used for stroing the data), which enables fast data transfer among the nodes. MapReduce: it is a distributed computation framework called MapReduce. Using mapReduce, Apache hadoop can process the HDFS data.   Hope you understand the “Apache Hadoop Introduction”. Do not think about the details of apache Hadoop architecture,