Data Flow mechanism in Apache Flume
Let’s study about Data Flow mechanism in Apache Flume.
* Flume is an open source framework used to move streaming data into HDFS. The following diagram explains the data flow mechanism in Flume.
* Web services generally generate events and log data and also Flume agents running on these servers.
* Flume agents running on the web services receive the data from the data generators like facebook and Twitter etc.
* The data in these agents will be collected by an intermediate node known as Collector. Just like agents, there can be multiple collectors in Flume.
* At last data from all collectors will be aggregated and pushed to a centralized store such as HBase or HDFS.
* Flume supports following services like,
1. Multi-hop Flow
An event may travel through more than one agent before reaching final destination is called as Multi-hop Flow.
2. Fan-out Flow
The data flow from one source to multiple channels is known as fan-out flow. There are two types of Fan-out Flow they are,
Here data will be replicated in all the configured channels.
In this case data will be sent to a selected channel which is mentioned in the header of the event
3. Fan-in Flow
In this case data will be transferred from many sources to one channel is known as fan-in flow.
That’s all about the data flow mechanism in flume, i hope this article explains magical task of flume in simple way.