What is Apache Sqoop

Let us study about the Apache Sqoop,

* Apache Sqoop is an open source tool in hadoop ecosystem.

* Sqoop mainly designed to transfer huge data set between Hadoop and external data stores like relational databases, enterprise data warehouses.

* The main functions of Apache sqoop are,

1. Import data

2. Export data

* It imports data from relational databases (RDBMS) like MySQL,Oracle,Postgresql and DB2 to Hadoop distributed file system (HDFS) like Hive and Hbase.

* It exports data from Hadoop file system to relational databases.

 Figure: Sqoop Tool Work flow

* From the above diagram we can clearly understand the data flow mechanism in sqoop.

* It is a command-line interface application for transferring data.


Sqoop was originally developed by Cloudera. Later on it was further developed and maintained by Apache then it is termed as Apache Sqoop . In April 2012, the Apache Sqoop project was promoted as Apache’s top-level project.

RDBMS is a source of huge relational data set, once the data rate increases from Gb to Tera byte and peta byte bigdata Hadoop came into exist to process, store and analyse huge data set, to transfer huge data from RDBMS to Hadoop file system we need a tool in that situation Sqoop tool came into picture and automates the process of importing & exporting the data.

Sqoop works with relational databases like Teradata, Netezza, Oracle, MySQL, Postgres etc.

“Sqoop: SQL to Hadoop and Hadoop to SQL Tool”


In this article we learnt about sqoop, it is used for import and export data between RDBMS and hadoop. sqoop is specially designed for structred data, It is boon for hadoop developers easily they can adopt in their projects. Basics of sql knowledge is sufficient to use sqoop in hadoop bigdata,will discuss in the next articles.



Thats all about the “sqoop introduction” will move further to study about sqoop in the next articles.