Apache Flume Configuration and Setup

In this article will see how to configure and setup the Flume file after installing the Flume.

Flume Configuration involves following steps,

  1. Name the components of the current agent.
  2. Configure the source.
  3. Describe or configure the sink.
  4. Configure the channel.
  5. Bind the source and the sink to the channel.

1. Naming the Components of the agent

In this Step we are listing all the components like sources, sinks, and the channels of the agent, as shown below.

agent_name.sources = source_name

agent_name.sinks = sink_name

agent_name.channels = channel_name

 Note: Flume supports various sources, sinks, and channels (list of these components table is given in the Flume architecture)

Example

In this example we are transferring twitter data using twitter source through a memory channel to an HDFS sink, and the agent name id twitter agent.

TwitterAgent.sources = Twitter

TwitterAgent.channels = MemChannel

TwitterAgent.sinks = HDFS

2. Configure the source

In this step we are configuring source with its type and properties. Here type name is common to every sources and it is used to specify the type of the source we are using.

agent_name.sources. source_name.type = value

agent_name.sources. source_name.property2 = value

agent_name.sources. source_name.property3 = value

Example

In this example we are taking twitter as a source type with the following properties.

TwitterAgent.sources.Twitter.type = Twitter

TwitterAgent.sources.Twitter.consumerKey = ***********

TwitterAgent.sources.Twitter.consumerSecret = ************

TwitterAgent.sources.Twitter.accessToken = *************

TwitterAgent.sources.Twitter.accessTokenSecret =**********

3. Configure the sink

In this step we are configuring sink with its type and properties. Here also ‘type’ name is common to every sink and it is used to specify the type of the sink we are using.

agent_name.sinks. sink_name.type = value

agent_name.sinks. sink_name.property2 = value

agent_name.sinks. sink_name.property3 = value

Example

In this example we are taking HDFS sink with the following properties.

TwitterAgent.sinks.HDFS.type = hdfs

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/beyond-corner/twitterdata/

TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream

TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000

TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

4. Configure the channel

In this step we are configuring channel with its type and properties. In Flume channel act as a bridge between source and sink.

agent_name.channels.channel_name.type = value

agent_name.channels.channel_name. property2 = value

agent_name.channels.channel_name. property3 = value

Example

In this example we are taking Memory channel and configuring with the following properties.

TwitterAgent.channels.MemChannel.type = memory

TwitterAgent.channels.MemChannel.capacity = 10000

TwitterAgent.channels.MemChannel.transactionCapacity = 100

5. Bind the source and the sink to the channel

In Flume channel act as a bridge between source and sink so in this step we are binding both source and sink.

agent_name.sources.source_name.channels = channel_name

agent_name.sinks.sink_name.channels = channel_name

Example

In this example we are showing how to bind source and sink to the channel. Here

Source: twitter

Sink: HDFS

Channel: Memory

TwitterAgent.sources.Twitter.channels = MemChannel

TwitterAgent.sinks.HDFS.channel = MemChannel

Starting a Flume Agent

Once the configuration is done then we need to start the Agent.

$ bin/flume-ng agent –conf ./conf/ -f conf/twitter.conf

Dflume.root.logger=DEBUG,console -n TwitterAgent

 Where,

AgentCommand to start the Flume agent
–conf ,-c<conf>configuration file used in the conf directory
-f<file>Specifies a config file path, if missing
–name, -n <name>Name of the twitter agent
-D property =valueSets a Java system property value

 

That’s all about the Apache Flume – configuration and setup, hope this article gives basic ideas of source, sink and channel configuration.