Apache Flume Sequence Generator source

In this article will study how to fetch data from Sequence generator source.

Introduction

Sequence generator is a source which generates the events (represents some message, token, count, pattern, and value) continuously, here counter is maintained to store events it starts from 0 and increment by 1.

Usage

* It is used for testing and debugging purpose.

* It generates a huge of data very quickly comapre to the NetCat source.

* Sequence generator reduce the frequency of data compare to all other sources, so its a better choice in Flume to receive the huge data.

Software Requirements

* A machine with Linux operating system.

* Apache hadoop should be installed (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html)

* Apache Flume should be installed (http://flume.apache.org/FlumeUserGuide.html)

Example

In this example will see how to fetch data from Sequence generator, let us consider,

Source: Sequence Generator

Sink: HDFS

Channel: Memory

Step 1:Change the directory to /user/local/hadoop/hduser.

$ cd /user/local/hadoop/hduser

Step 2:Start all hadoop daemons.

$ start-all.sh

Step 3:Let’s check JVM (java virtual machine) status.

$ jps

Step 4:Change the directory to /usr/local/flume

$ cd $FLUME_HOME

Step 5:Configuration of  seq_gen.conf File.

Here we are configuring “seq_gen.conf” file, copy this into flume folder.

# Naming the components on the current agent

SeqGenAgent.sources = SeqSource

SeqGenAgent.channels = MemChannel

SeqGenAgent.sinks = HDFS

# Describing/Configuring the source

SeqGenAgent.sources.SeqSource.type = seq

# Describing/Configuring the sink

SeqGenAgent.sinks.HDFS.type = hdfs

SeqGenAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/hduser/flumedata/seqgen_data/

SeqGenAgent.sinks.HDFS.hdfs.filePrefix = log

SeqGenAgent.sinks.HDFS.hdfs.rollInterval = 0

SeqGenAgent.sinks.HDFS.hdfs.rollCount = 10000

SeqGenAgent.sinks.HDFS.hdfs.fileType = DataStream

# Describing/Configuring the channel

SeqGenAgent.channels.MemChannel.type = memory

SeqGenAgent.channels.MemChannel.capacity = 1000

SeqGenAgent.channels.MemChannel.transactionCapacity = 100

# Binding source and sink to the channel

SeqGenAgent.sources.SeqSource.channels = MemChannel

SeqGenAgent.sinks.HDFS.channel = MemChannel

Step 6: Execution

Here source called “Sequence Generator” starts generating sequence numbers which will be pushed into the HDFS in the form of log files.

   $ cd $FLUME_HOME

$./bin/flume-ng agent –conf $FLUME_CONF –conf-file $FLUME_CONF/seq_gen.conf

–name SeqGenAgent

Step 7: Verify in the HDFS

  $ hadoop fs -ls user/hduser/flumedata/seqgen_data/

“That’s all about the Apache Flume – Sequence Generator source, It is a best tool for testing,hope this will be useful”.