Sqoop Importing MySQL Data into HDFS


In this post, we will create a new database in MySQL server and create table and insert data into it and we will do importing MySQL data into HDFS via Sqoop tool.

We assume that MySQL is installed and Sqoop & Hadoop are installed on local machine to test this example. We need to make sure that MySQL JDBC drivers connector jar file is downloaded and copied into SQOOP_HOME/lib directory.

Prepare MySQL Data

Login to MySQL and create a new database, and new table and load some sample data shown in below in sequence. In this example we will create sample table with below columns:

  • first_name
  • last_name
  • company_name
  • address
  • country
  • city
  • state

Login to MySQL as root:

Create new Database test_db:

Grant all Privileges on this database to all the users at localhost (or IP addresses of the machines (repeat the same commands with different IP addresses if there are more no of client machines to interact with this MySQL server database)).

MySQL Database Creation

As the test_db is permitted to be used by any user@localhost, now we can login to MySQL directly without any root credentials.

Sample User data input file used in this example is available –> userdata.

Bulk data load into MySQL table
Importing MySQL Data Into HDFS via Sqoop

 

Sqoop submits map only jobs to mapreduce framework for copying the data from relational databases to Hadoop. By default sqoop submits 4 map tasks and the input data from MySQL table is divided equally across all 4 map tasks based on primary key in MySQL table.

As we are running in localhost, we have setup number of map tasks to 2 with -m 2 option.

Submit the below Sqoop import command to copy user_data table into HDFS /table/userdata location.

Storing as Text file

Sqoop Import

Lets verify the output data in HDFS.

Verify Sqoop Import output

Using a File Format Other Than Text file

Storing as Sequence file

We need to use –as-sequencefile option in sqoop import command to store it as a sequence file.

Sqoop Import Sequencefile Output

Storing as Avrodata file

We need to use –as-avrodatafile option in sqoop import command to store it as a avro file.

Storing as Avro data file

Importing based on a filter/criteria

For this we can use option –where “column=’value'” to select only a subset of input based on a criteria or filter condition.

Sqoop Import with Where condition

Compressing Imported Data

We need to use –compress option along with –compression-code with codec classname. Its classname can be any of these from below.

  • org.apache.hadoop.io.compress.GzipCodec
  • org.apache.hadoop.io.compress.DefaultCodec
  • org.apache.hadoop.io.compress.BZip2Codec
  • org.apache.hadoop.io.compress.SnappyCodec

Gzipped Compress data

Importing all the tables

Excluding a few tables


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA

.