Sqoop Interview Cheat Sheet 1


Install sqoop

sudo yum install sqoop

  1. sudo apt-get install sqoop
  2. in sqoop-normal commnd prompt
  3. sqoop config file—sqoop site.xml

install jdbc drivers

After you’ve obtained the driver, you need to copy the driver’s JAR file(s) into Sqoop’s lib/ directory. If you’re
using the Sqoop tarball, copy the JAR files directly into the lib/ directory after unzipping the tarball. If you’re using packages, you will need to copy the driver files into the /usr/lib/sqoop/lib directory

sqoop list-databases

import

sqoop import

*results are imported to hdfs under cloudera-employees

In employees directory 4 files are created

  • part-m-00000
  • part-m-00001
  • part-m-00002
  • part-m-00003

import all

 

mappers m1

sqoop import

 

Subset of Data
where

sqoop import

 

part-m-00000-179973 records.
When using the –where parameter, keep in mind the parallel nature of Sqoop transfers.
Data will be transferred in several concurrent tasks. Any expensive function call will put a significant performance burden on your database server. Advanced functionscould lock certain tables, preventing Sqoop from transferring data in parallel. This willadversely affect transfer performance. For efficient advanced filtering, run the filteringquery on your database prior to import, save its output to a temporary table and run
Sqoop to import the temporary table into Hadoop without the –where parameter.

boundary queries

 

 

 

 

By default, when using the –compress parameter, output fileswill be compressed using the GZip codec, and all files will end up with a .gz extension.
You can choose any other codec using the –compression-codec parameter.

other compression codecs
bzip2-.bz2