Apache Flume NetCat Source  

In this article will study how to fetch data from Apache Flume NetCat Source.

Introduction

NetCat is a source and also computer networking utility for reading from and writing to network connections using TCP or UDP which generates the events (represents some message, token, count, pattern, and value) and logs (information about network traffic) into console. For this source we have to specify the port, here it listens to the given port and receives each line we entered in that port as an individual event and transfers it to the sink through the specified channel.

Usage:The NetCat is best source used to receive the network traffic compare tp other Flume sources.

Logger is sink which sink all the events, log passed to it, it is used for testing or debugging purpose.

Software Requirements

* A machine with Linux operating system.

* Apache hadoop should be installed (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html)

* Apache Flume should be installed (http://flume.apache.org/FlumeUserGuide.html)

Example

In this example will see how to fetch data from NetCat Source, let us consider

Source: NetCat

Sink: logger

Channel: Memory

Step 1: Change the directory to /usr/local/hadoop/hduser1.

$ cd /usr/local/hadoop/sbin

Step 2: Start all hadoop daemons.

$ start-all.sh

Step 3: Let’s check JVM (java virtual machine) status.

$ jps

Step 4: hange the directory to /usr/local/flume

$ cd $FLUME_HOME

Step 5: Configuration of  netcat.conf  File.

Here we are configuring “netcat.conf” file, copy this into flume folder.

# Naming the components on the current agent

NetcatAgent.sources = Netcat

NetcatAgent.channels = MemChannel

NetcatAgent.sinks = LoggerSink

# Describing/Configuring the source

NetcatAgent.sources.Netcat.type = netcat

NetcatAgent.sources.Netcat.bind = localhost

NetcatAgent.sources.Netcat.port = 56565

# Describing/Configuring the sink

NetcatAgent.sinks.LoggerSink.type = logger

# Describing/Configuring the channel

NetcatAgent.channels.MemChannel.type = memory

NetcatAgent.channels.MemChannel.capacity = 1000

NetcatAgent.channels.MemChannel.transactionCapacity = 100

# Bind the source and sink to the channel

NetcatAgent.sources.Netcat.channels = MemChannel

NetcatAgent.sinks.LoggerSink.channel = MemChannel

Step 6: Execution

$ cd $FLUME_HOME

$ ./bin/flume-ng agent –conf $FLUME_CONF –conf-file $FLUME_CONF/netcat.conf

–name NetcatAgent -Dflume.root.logger=INFO,console

Step 7: Passing Data to the Source

Here we are passing data into source by using port. open new terminal then by using below command connect, when the connection is successful “connected”message will be displayed.

$ curl telnet localhost 56563

Connected

Note: The NetCat source receives data line by line, It will consider each line as an individual event and it will display message “OK”.

Step 8: Verify in the HDFS

$ hdfs dfs -ls user/hduser1/flumedata/NetCat_data/

“That’s all about Apache Flume – NetCat Source, using NetCat source we can

get all streaming data from network traffic, social media, and email messages into bigdata for data analysis or storage. i hope this will be useful”.