In this post, we will create a new database in MySQL server and create table and insert data into it and we will do importing MySQL data into HDFS via Sqoop tool.
We assume that MySQL is installed and Sqoop & Hadoop are installed on local machine to test this example. We need to make sure that MySQL JDBC drivers connector jar file is downloaded and copied into SQOOP_HOME/lib directory.
Prepare MySQL Data
Login to MySQL and create a new database, and new table and load some sample data shown in below in sequence. In this example we will create sample table with below columns:
Login to MySQL as root:
Create new Database test_db:
Grant all Privileges on this database to all the users at localhost (or IP addresses of the machines (repeat the same commands with different IP addresses if there are more no of client machines to interact with this MySQL server database)).
As the test_db is permitted to be used by any user@localhost, now we can login to MySQL directly without any root credentials.
Sample User data input file used in this example is available –> userdata.
Importing MySQL Data Into HDFS via Sqoop
Sqoop submits map only jobs to mapreduce framework for copying the data from relational databases to Hadoop. By default sqoop submits 4 map tasks and the input data from MySQL table is divided equally across all 4 map tasks based on primary key in MySQL table.
As we are running in localhost, we have setup number of map tasks to 2 with -m 2 option.
Submit the below Sqoop import command to copy user_data table into HDFS /table/userdata location.
Storing as Text file
Lets verify the output data in HDFS.
Using a File Format Other Than Text file
Storing as Sequence file
We need to use –as-sequencefile option in sqoop import command to store it as a sequence file.
Storing as Avrodata file
We need to use –as-avrodatafile option in sqoop import command to store it as a avro file.
Importing based on a filter/criteria
For this we can use option –where “column=’value'” to select only a subset of input based on a criteria or filter condition.
Compressing Imported Data
We need to use –compress option along with –compression-code with codec classname. Its classname can be any of these from below.