sudo yum install sqoop
- sudo apt-get install sqoop
- in sqoop-normal commnd prompt
- sqoop config file—sqoop site.xml
install jdbc drivers
After you’ve obtained the driver, you need to copy the driver’s JAR file(s) into Sqoop’s lib/ directory. If you’re
using the Sqoop tarball, copy the JAR files directly into the lib/ directory after unzipping the tarball. If you’re using packages, you will need to copy the driver files into the /usr/lib/sqoop/lib directory
*results are imported to hdfs under cloudera-employees
In employees directory 4 files are created
Subset of Data
When using the –where parameter, keep in mind the parallel nature of Sqoop transfers.
Data will be transferred in several concurrent tasks. Any expensive function call will put a significant performance burden on your database server. Advanced functionscould lock certain tables, preventing Sqoop from transferring data in parallel. This willadversely affect transfer performance. For efficient advanced filtering, run the filteringquery on your database prior to import, save its output to a temporary table and run
Sqoop to import the temporary table into Hadoop without the –where parameter.
By default, when using the –compress parameter, output fileswill be compressed using the GZip codec, and all files will end up with a .gz extension.
You can choose any other codec using the –compression-codec parameter.
other compression codecs