Data Flow mechanism in Apache Flume

Let’s study about Data Flow mechanism in Apache Flume.

* Flume is an open source framework used to move streaming data into HDFS. The following diagram explains the data flow mechanism in Flume.

* Web services generally generate events and log data and also Flume agents running on these servers.

* Flume agents running on the web services receive the data from the data generators like facebook and Twitter etc.

* The data in these agents will be collected by an intermediate node known as Collector. Just like agents, there can be multiple collectors in Flume.

* At last data from all collectors will be aggregated and pushed to a centralized store such as HBase or HDFS.

* Flume supports following services like,

1. Multi-hop Flow

    An event may travel through more than one agent before reaching final destination is called as Multi-hop Flow.

2.  Fan-out Flow

    The data flow from one source to multiple channels is known as fan-out flow. There are two types of Fan-out Flow they are,

Replicating

Here data will be replicated in all the configured channels.

Multiplexing

In this case data will be sent to a selected channel which is mentioned in the header of the event

3. Fan-in Flow

In this case data will be transferred from many sources to one channel is known as fan-in flow.

Reference

http://flume.apache.org/FlumeUserGuide.html#data-flow-model

 

That’s all about the data flow mechanism in flume, i hope this article explains magical task of flume in simple way.