Apache HDFS Overview
Do you know? What is HDFS in Hadoop ? Lets understand everything with Apache HDFS Overview. Apache HDFS tutorials also cover the HDFS database basics.
Apache HDFS Overview:
Apache HDFS is primarily used by Hadoop applications for distributed storage. It is the world’s most reliable storage system.
Basic purpose of HDFS is to divide huge amount of data and store it in multiple machines and provide easy access to those data sets.
- HDFS makes sure that no data is lost in case of system failure and makes the applications available for parallel processing.
- HDFS is completely written in JAVA programming and it is based on Google File System (GFS).
As we have seen earlier in our previous topics “Hadoop architecture”. HDFS works on Master-Slave principle. A HDFS cluster have a Master – Name Node that manages the file system metadata and Slave – Data Nodes that have the data. The client contacts the Name Node for file metadata to perform input/output directly with the data nodes.
- Apache HDFS can store up to 200 PB of data.
- Having writing-once- reading many models that enables high-throughput access.
Note: Below diagram summarizes the working Overview the Apache Hadoop.
HDFS has many goals. Here are some of the most notable:
- It is highly fault tolerant which identifies faults and applies automatic and quick recovery.
- MapReduce streaming used to access data.
- Simple and robust coherency model.
- Portability across diverse/mixed commodity hardware and operating systems
- Highly scalable & reliable storage and can process large amounts of data
- Economy by distributing data and processing across clusters of commodity personal computers
- Reliability by automatically maintaining multiple copies of data and automatically redeploying processing logic in the event of failures
Lot’s of topics will cover in next sections, which is HDFS Nodes (Master – Slave Topology), HDFS Architecture, HDFS Feature, HDFS Operation for read/write and much more.