Apache Flume Architecture
Let’s deep dive into the Apache Flume Architecture.
Figure: Flume Architecture
* The main design goal of Flume Architecture is,
* The Flume is mainly used to feed streaming data from different data sources to the hdfs or hive.
From the Above Architecture diagram will see the working of Flume,
* The data generators like facebook and Twitter etc generate real time streaming data then those data is collected by individual Flume agents running on flume. Thereafter, data collector collects the data from the individual agents then it aggregated and pushed into a centralized store such as HDFS or HBase.
* Flume Agent is an independent daemon process. i.e JVM in Flume.
* Agent receives the data from clients or other agents and forwards it to its next destination i.e sink or agent.
* Agent contains three main components such as (explained in the data Flow model)
Data Flow Model
A source is the one of the component of Flume Agent which receives data from the web server and transfers it to one or more channels in the form of Flume events.
A channel is a path which receives the events from the source and buffers them till they are consumed by sinks. we can say that it is a bridge between the sources and the sinks.
A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination.
Example of Flume Source, Channel and Sink is shown in the below table
|1. Avro source|
2. Exec source
3. Spooling directory source
4. NetCat source
5. Sequence generator source
6. Syslog source
7. HTTP source
8. Custom source
9. Scribe source
|1. Memory channel|
2. JDBC channel
3. File channel
4. Custom channel
|1. HDFS sink|
2. Logger sink
3. Avro sink
4. IRC sink
5. File Roll sink
6. Null sink
7. Hbase sink sink
8. AsyncHbase sink
9. Elasticsearch sink
10. Custom sink
That’s all about the Flume architecture, This article gives you simple and detail description about Flume architecture, hope it will be useful for beginners.