Features and limitations of Sqoop

Let’s discuss about the Feature and limitations of sqoop,


The salient features of sqoop are,

1. Sqoop is robust in nature easily usable and has community support and contribution.

2. Full load

In sqoop using single command we can load all the tables from the database.

3. Incremental Load

Using sqoop we can load part of the table whenever it is updated.

4. Parallel import/export

Sqoop provides fault tolerance by using YARN framework in parallel import and export the data.

5. Import results of SQL query

Sqoop also import the result returned from an SQL query in HDFS.

6. Compression

We can compress huge datasets using snappy method (explained in the next article), and we can load compress table in hive.

7. Connectors for all major RDBMS Databases

Sqoop provides connectors for multiple RDBMS databases like MySQL, PostgreSQL, Oracle, SQL Server, and DB2 so that it gives better performance.

8. Kerberos Security Integration

Sqoop supports Kerberos authentication. i.e Kerberos is a computer network authentication protocol it works on the basis of ‘tickets’ to allow nodes communicating over a non-secure network to prove their identity to one another in a secure manner.

9. Load data directly into HIVE/HBase

We can load data directly into Hive for data analysis and also dump data into HBase, which is a NoSQL database.

10. Support for Accumulo

We can instruct Sqoop to import the table in accumulo rather than a directory in HDFS.

Limitations of Sqoop

The limitations of sqoop are,

* Sqoop cannot be paused and resumed. It is an atomic step. If it is failed we need to clear things up and start again.

* Sqoop Export performance also depends upon the hardware configuration (Memory, Hard disk) of RDBMS server.

* Sqoop is slow because it still uses MapReduce in backend processing.

* Failures need special handling in case of partial import or export.

* For few databases Sqoop provides bulk connector which has faster performance.

*It uses a JDBC connection to connect with RDBMS based on data stores, and this can be inefficient and less performance.

* Sqoop does not provide GUI (graphical user interface) for easy use.

Note: “Some of the limitations Sqoop are resolved in Sqoop2”

What’s New in sqoop 2

* Sqoop 2 comes with GUI for easy use along with command line.

* It fixes security issues like openly shared password in queries.

* Provides easy debugging and better login in Sqoop 2.

* Supports other connectors, it does not fallow JDBC model.

* It provides Server side configuration.

That all about Sqoop 1 and 2, in detail will study in the next article.