Sqoop Interview Cheat Sheet 1

Install sqoop

sudo yum install sqoop

  1. sudo apt-get install sqoop
  2. in sqoop-normal commnd prompt
  3. sqoop config file—sqoop site.xml

install jdbc drivers

After you’ve obtained the driver, you need to copy the driver’s JAR file(s) into Sqoop’s lib/ directory. If you’re
using the Sqoop tarball, copy the JAR files directly into the lib/ directory after unzipping the tarball. If you’re using packages, you will need to copy the driver files into the /usr/lib/sqoop/lib directory

sqoop list-databases


sqoop import

*results are imported to hdfs under cloudera-employees

In employees directory 4 files are created

  • part-m-00000
  • part-m-00001
  • part-m-00002
  • part-m-00003

import all


mappers m1

sqoop import


Subset of Data

sqoop import


part-m-00000-179973 records.
When using the –where parameter, keep in mind the parallel nature of Sqoop transfers.
Data will be transferred in several concurrent tasks. Any expensive function call will put a significant performance burden on your database server. Advanced functionscould lock certain tables, preventing Sqoop from transferring data in parallel. This willadversely affect transfer performance. For efficient advanced filtering, run the filteringquery on your database prior to import, save its output to a temporary table and run
Sqoop to import the temporary table into Hadoop without the –where parameter.

boundary queries