Learn Apache Flume

Apache Flume Course Introduction

Apache flume is an open source data collecting tool used for moving the data from source to destination. In this Apache Flume tutorial, we will study how Flume helps in streaming data from various sources and why Flume became so popular.

Every company has lot of servers and applications, lot of data you can say logs are produced by the applications. To process that logs, we need a scale able, manageable, extensible and reliable data collection tools which can collect the data from one location to another location, where they will be processed (like HDFS). Apache flume is an open source data collection tool for moving the data from source to destination.

Apache Flume is usefull for moving large amounts of streaming data into the Hadoop Distributed File System (HDFS) and its highly fault-tolerant and robust. Flume can do data collection in batch and streaming mode.

Where Flume can Help?

Problem: As you can see below, we need to analyze the lot of server logs using HDFS/Hadoop but how we can send the logs to HDFS.

why flume is required?

Solutions:
Apache Flume is the most scale able,manageable, extensible and reliable data collection tools for systematically collecting and moving large amounts of streaming data to the HDFS.

need of apache flume

 

Contents

1.         Apache Flume Introduction

1.1       Applications and  features of Apache Flume

1.2       Advantage and Disadvantage of Apache Flume

1.3       Apache Fume Architecture

2.         Data Flow mechanism in Apache Flume

3.         Apache Flume Configuration and Setup 

3.1       Apache Flume NetCat Source 

3.2       Apache Flume Sequence Generator source

4.         Flume Hands-On Streaming Twitter data

5.         Difference between Flume and sqoop