Apache Flume Configuration and Setup
In this article will see how to configure and setup the Flume file after installing the Flume.
Flume Configuration involves following steps,
- Name the components of the current agent.
- Configure the source.
- Describe or configure the sink.
- Configure the channel.
- Bind the source and the sink to the channel.
1. Naming the Components of the agent
In this Step we are listing all the components like sources, sinks, and the channels of the agent, as shown below.
|agent_name.sources = source_name|
agent_name.sinks = sink_name
agent_name.channels = channel_name
Note: Flume supports various sources, sinks, and channels (list of these components table is given in the Flume architecture)
In this example we are transferring twitter data using twitter source through a memory channel to an HDFS sink, and the agent name id twitter agent.
|TwitterAgent.sources = Twitter|
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
2. Configure the source
In this step we are configuring source with its type and properties. Here type name is common to every sources and it is used to specify the type of the source we are using.
|agent_name.sources. source_name.type = value|
agent_name.sources. source_name.property2 = value
agent_name.sources. source_name.property3 = value
In this example we are taking twitter as a source type with the following properties.
|TwitterAgent.sources.Twitter.type = Twitter|
TwitterAgent.sources.Twitter.consumerKey = ***********
TwitterAgent.sources.Twitter.consumerSecret = ************
TwitterAgent.sources.Twitter.accessToken = *************
3. Configure the sink
In this step we are configuring sink with its type and properties. Here also ‘type’ name is common to every sink and it is used to specify the type of the sink we are using.
|agent_name.sinks. sink_name.type = value|
agent_name.sinks. sink_name.property2 = value
agent_name.sinks. sink_name.property3 = value
In this example we are taking HDFS sink with the following properties.
|TwitterAgent.sinks.HDFS.type = hdfs|
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/beyond-corner/twitterdata/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
4. Configure the channel
In this step we are configuring channel with its type and properties. In Flume channel act as a bridge between source and sink.
|agent_name.channels.channel_name.type = value|
agent_name.channels.channel_name. property2 = value
agent_name.channels.channel_name. property3 = value
In this example we are taking Memory channel and configuring with the following properties.
|TwitterAgent.channels.MemChannel.type = memory|
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
5. Bind the source and the sink to the channel
In Flume channel act as a bridge between source and sink so in this step we are binding both source and sink.
|agent_name.sources.source_name.channels = channel_name|
agent_name.sinks.sink_name.channels = channel_name
In this example we are showing how to bind source and sink to the channel. Here
|TwitterAgent.sources.Twitter.channels = MemChannel|
TwitterAgent.sinks.HDFS.channel = MemChannel
Starting a Flume Agent
Once the configuration is done then we need to start the Agent.
|$ bin/flume-ng agent –conf ./conf/ -f conf/twitter.conf|
Dflume.root.logger=DEBUG,console -n TwitterAgent
|Agent||Command to start the Flume agent|
|–conf ,-c<conf>||configuration file used in the conf directory|
|-f<file>||Specifies a config file path, if missing|
|–name, -n <name>||Name of the twitter agent|
|-D property =value||Sets a Java system property value|
That’s all about the Apache Flume – configuration and setup, hope this article gives basic ideas of source, sink and channel configuration.