Apache Flume NetCat Source
In this article will study how to fetch data from Apache Flume NetCat Source.
NetCat is a source and also computer networking utility for reading from and writing to network connections using TCP or UDP which generates the events (represents some message, token, count, pattern, and value) and logs (information about network traffic) into console. For this source we have to specify the port, here it listens to the given port and receives each line we entered in that port as an individual event and transfers it to the sink through the specified channel.
Usage:The NetCat is best source used to receive the network traffic compare tp other Flume sources.
Logger is sink which sink all the events, log passed to it, it is used for testing or debugging purpose.
* A machine with Linux operating system.
* Apache hadoop should be installed (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html)
* Apache Flume should be installed (http://flume.apache.org/FlumeUserGuide.html)
In this example will see how to fetch data from NetCat Source, let us consider
Step 1: Change the directory to /usr/local/hadoop/hduser1.
|$ cd /usr/local/hadoop/sbin|
Step 2: Start all hadoop daemons.
Step 3: Let’s check JVM (java virtual machine) status.
Step 4: hange the directory to /usr/local/flume
|$ cd $FLUME_HOME|
Step 5: Configuration of netcat.conf File.
Here we are configuring “netcat.conf” file, copy this into flume folder.
|# Naming the components on the current agent|
NetcatAgent.sources = Netcat
NetcatAgent.channels = MemChannel
NetcatAgent.sinks = LoggerSink
# Describing/Configuring the source
NetcatAgent.sources.Netcat.type = netcat
NetcatAgent.sources.Netcat.bind = localhost
NetcatAgent.sources.Netcat.port = 56565
# Describing/Configuring the sink
NetcatAgent.sinks.LoggerSink.type = logger
# Describing/Configuring the channel
NetcatAgent.channels.MemChannel.type = memory
NetcatAgent.channels.MemChannel.capacity = 1000
NetcatAgent.channels.MemChannel.transactionCapacity = 100
# Bind the source and sink to the channel
NetcatAgent.sources.Netcat.channels = MemChannel
NetcatAgent.sinks.LoggerSink.channel = MemChannel
Step 6: Execution
|$ cd $FLUME_HOME|
$ ./bin/flume-ng agent –conf $FLUME_CONF –conf-file $FLUME_CONF/netcat.conf
–name NetcatAgent -Dflume.root.logger=INFO,console
Step 7: Passing Data to the Source
Here we are passing data into source by using port. open new terminal then by using below command connect, when the connection is successful “connected”message will be displayed.
|$ curl telnet localhost 56563|
Note: The NetCat source receives data line by line, It will consider each line as an individual event and it will display message “OK”.
Step 8: Verify in the HDFS
|$ hdfs dfs -ls user/hduser1/flumedata/NetCat_data/|
“That’s all about Apache Flume – NetCat Source, using NetCat source we can
get all streaming data from network traffic, social media, and email messages into bigdata for data analysis or storage. i hope this will be useful”.