Sqoop Import Command Arguments 2


In this post we will discuss about one of the important commands in Apache Sqoop, Sqoop Import Command Arguments with examples. This documentation is applicable for sqoop versions 1.4.5 or later because earlier versions doesn’t support some of the below mentioned arguments to import command

As of Sqoop 1.4.5 version, Sqoop import command supports various number of arguments to import relational database tables into below tools or services.

  • HDFS
  • Hive
  • HBase
  • HCatalog
  • Accumulo

There are specific arguments for these services, apart from

  • Common arguments
  • Control arguments
  • Input parsing arguments
  • Output line formatting arguments
  • Incremental import arguments.

Lets discuss about each of these arguments with a brief explanation about each argument.

Sqoop Import Command Arguments

First, lets discuss about the common arguments to Sqoop Import command. Each argument in sqoop commands is prefixed by ‘–‘ (double-hyphen). In the below table only mandatory arguments are highlighted with red color in description column.

Common arguments

Argument Description & Example
–connect <jdbc-uri> To Specify JDBC connect string containing hostname or IP address (optionally port) followed by database name. It is mandatory argument.
–connect jdbc:mysql://localhost/test_db 
In this example we are connecting to MySQL On localhost and database name is test_db
–connection-manager <class>  Specify connection manager class name It is optional. Ex:
–connection-manager org.apache.sqoop.manager.GenericJdbcManager
–connection-param-file <file> optionally specify extra JDBC parameters via a property file using this option. This file is parsed as standard Java properties and passed into the driver while creating a connection.
 –driver <class> Manually specify JDBC driver class to use. Ex: –driver oracle.jdbc.OracleDriver 
 –hadoop-home <hdir> By default sqoop will look for $HADOOP_HOME to set to $HADOOP_COMMON_HOME. We can override this value with this argument.Specify this to use custom hadoop installation instead of default from $HADOOP_HOME 
 –hadoop-mapred-home <dir> Same as above for $HADOOP_MAPRED_HOME  Specify the path of new hadoop mapreduce installation directory.
–username & (–password or -P) We can specify the username and password of the RDBMS to retrieve the tables. Example: –username u1 –password 12345 

Notes

  • We should not use the URL localhost in the connect string, if we intend to use Sqoop with a distributed Hadoop cluster. The connect string we supply will be used on data nodes throughout our cluster; if we specify the literal name localhost, each node will connect to a different database on their localhosts (or more likely, no database at all). Instead, we should use the full hostname or IP address of the database host that can be seen by all our remote nodes.
  • The –password parameter is insecure, as other users may be able to read our password from the command-line arguments via the output of programs such as ps.                       The -P argument (prompts for user password) will read a password from a console prompt ,and is the preferred method of entering credentials. Credentials may still be transferred between nodes of the MapReduce cluster using insecure means. The most secure way is to use, –password-file <file containing the password> method. Set authentication password in this file on the users home directory with 400 permissions.
  • The jar containing the driver class (ex: com.mysql.jdbc.Driver) should be copied into $SQOOP_HOME/lib directory otherwise exception will be thrown.

Below is the example sqoop import command covering all the arguments discussed above.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Sqoop Import Command Arguments

  • Joseph

    Hey Siva,

    I have been trying to work with sqoop and i wanted to test connection to mysql and i tried your command
    sqoop list-databases –connect jdbc:mysql://localhost –connection-manager org.apache.sqoop.manager.GenericJdbcManager –driver com.mysql.jdbc.Driver –username root –password MYPASSWORD
    and i end up with an error
    15/09/25 14:25:03 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    15/09/25 14:25:03 INFO manager.SqlManager: Using default fetchSize of 1000
    15/09/25 14:25:03 ERROR manager.SqlManager: Generic SqlManager.listDatabases() not implemented.
    Could not retrieve database list from server
    15/09/25 14:25:03 ERROR tool.ListDatabasesTool: manager.listDatabases() returned null

    I have also put the jar file in /var/lib/sqoop ”
    mysql-connector-java-5.0.8-bin.jar”  and when i try it wothout specifying the –connection-manager it automatically defaults to GenericJDBCManager,  any idea why this isnt working ?
    Thanks
    -Joseph


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.