Sqoop Export Commands

Below are some Sqoop Export Commands and Other Miscellaneous commands

Sqoop-export

It is nothing but exporting data from HDFS to database. To use ‘export‘ command, a table in database should already exist. then only export functionality in sqoop will works.

export command will works in two ways
1. insert
2. update

1. insert: insert mode will insert the new records from HDFS to RDBMS table.

command:

Note: if a record already present in the database table with same primary key, then it will raise MySQLIntegrityConstraintViolationException exception.

2. update: update mode will update the records in the RDBMS from HDFS data.

update mode only update already existing records, it will not insert new records in the RDBMS.

command:

Sqoop-JOB

Job is nothing but to save a sqoop command and excute n times when we require it. The job command allows you to create and work with saved jobs. Saved jobs remember the parameters used to specify a job, so they can be re-executed by invoking the job by its handle.

The following are Sqoop Job commands,

Create job:

Note:
There should be space between ‘–‘ and ‘import’.

list jobs: it will show all the jobs.

inspect job: it will show details about the job.

delete job: it will delete existing job.

execute job: it will execute the job.

Sqoop-Merge

Merge command in sqoop is to merge two existing tables in HDFS to single new table in HDFS.

Let us discuss merge functionality with the example:

The below are tables in RDBMS,

Table 1: emp_1

Table 2: emp_2

now,

Importing emp_1 into HDFS

Importing emp_2 into HDFS

After importing emp_1 and emp_2 into HDFS, sqoop will generate a jar into our local file system. with help of that jar we will able to merge the two tables in HDFS

1. create a database and table as shown below

2. inserted 10000 + records as shown below

Note: I have used mysql import to load 10000+ records, as below

3. Run below sqoop command to import from MYSQL to HDFS location

It created 4 Map jobs and stored in part files in HDFS location.

4. Updated some records randomly

5. Used below command to pic updated records and merged into single file.

Sqoop-import-hive

sqoop provides direct import into hive.There are two ways to import into hive

1. Create-hive-table: it will be used when data already imported into hdfs but not into hive

By above command, sqoop will create an empty table with DB column properties.To load data we need to use hive command as shown below:

2.hive-import: It is a direct import from RDBMS to Hive

Sqoop Importing MySQL Data into HDFS

In this post, we will create a new database in MySQL server and create table and insert data into it and we will do importing MySQL data into HDFS via Sqoop tool.

We assume that MySQL is installed and Sqoop & Hadoop are installed on local machine to test this example. We need to make sure that MySQL JDBC drivers connector jar file is downloaded and copied into SQOOP_HOME/lib directory.

Prepare MySQL Data

Login to MySQL and create a new database, and new table and load some sample data shown in below in sequence. In this example we will create sample table with below columns:

  • first_name
  • last_name
  • company_name
  • address
  • country
  • city
  • state

Login to MySQL as root:

Create new Database test_db:

Grant all Privileges on this database to all the users at localhost (or IP addresses of the machines (repeat the same commands with different IP addresses if there are more no of client machines to interact with this MySQL server database)).

MySQL Database Creation

As the test_db is permitted to be used by any user@localhost, now we can login to MySQL directly without any root credentials.

Sample User data input file used in this example is available –> userdata.

Bulk data load into MySQL table
Importing MySQL Data Into HDFS via Sqoop

 

Sqoop submits map only jobs to mapreduce framework for copying the data from relational databases to Hadoop. By default sqoop submits 4 map tasks and the input data from MySQL table is divided equally across all 4 map tasks based on primary key in MySQL table.

As we are running in localhost, we have setup number of map tasks to 2 with -m 2 option.

Submit the below Sqoop import command to copy user_data table into HDFS /table/userdata location.

Storing as Text file

Sqoop Import

Lets verify the output data in HDFS.

Verify Sqoop Import output

Using a File Format Other Than Text file

Storing as Sequence file

We need to use –as-sequencefile option in sqoop import command to store it as a sequence file.

Sqoop Import Sequencefile Output

Storing as Avrodata file

We need to use –as-avrodatafile option in sqoop import command to store it as a avro file.

Storing as Avro data file

Importing based on a filter/criteria

For this we can use option –where “column=’value'” to select only a subset of input based on a criteria or filter condition.

Sqoop Import with Where condition

Compressing Imported Data

We need to use –compress option along with –compression-code with codec classname. Its classname can be any of these from below.

  • org.apache.hadoop.io.compress.GzipCodec
  • org.apache.hadoop.io.compress.DefaultCodec
  • org.apache.hadoop.io.compress.BZip2Codec
  • org.apache.hadoop.io.compress.SnappyCodec

Gzipped Compress data

Importing all the tables

Excluding a few tables