Apache Flume Introduction

What is Apache Flume?

Apache Flume is a tool for designed for feeding streaming data into HDFS or hbase.

Objective: The main objective of Flume is to capture streaming data from various web servers to HDFS.

Introduction

* Flume is a simple pipeline structure with three roles,

  1. Source
  2. Channel
  3. Sink

Source defines from where streaming data comes for example network traffic, social media, and email messages.

Channel is a path connecting between source and sink.

Sink is a destination i.e hdfs or hbase.

* It has a distributed system used for aggregating the files to a single location.

* It has in-built features like reliability, recovery mechanisms and fault tolerant.

* Apache Flume used in online analytic application.

* Flume allows data collection in batch mode as well as streaming mode.

* Flume used to write data from multiple sources into a single destination.

Why Flume?

  1. In order to ingest large streaming data into hdfs or hbase we need Flume.
  2. It performs collecting and aggregating streaming data.
  3. Flume able to collect data in real-time as well as batch mode.
  4. Apache Flume is a highly reliable, distributed, and configurable
  5. It is open source software so beginners and students easily download from the net.
  6. It gives low latency and high throughput.
  7. Easy and centralized management using web UI (user interface) or console.

Where Flume can be used?

Some of the companies are using Flumes,

  1. Goibibo  is a India’s Largest Online hotel,travelling booking Company, daily it generates hug amount of streaming data of customres, so it choose flume as better choice to transfer logs from the production systems into HDFS. The streaming data is stored in HDFS for future analysis of Business up and downs.
  2. Mozilla is a open source web browser uses flume for the BuildBot project along with Elastic Search.
  3. Capillary technologies use Flume for aggregating logs from 25 machines in production.

Conclusion

Apache Flume is mainly designed to transfer bulk streaming data from various web sources to hdfs. flume is widely used in online analytics. It is an open source tool available in bigdata hadoop.

Reference

http://flume.apache.org/index.html

That’s all about the Apache Flume introduction, It is a better choice for analyzing  streaming data in bigdata hadoop.