In this post we will provide some practical Sqoop Interview Questions and Answers for experienced hadoop developers.
Sqoop Interview Questions and Answers for Experienced
1. What is Sqoop?
Sqoop is an open source tool that enables users to transfer bulk data between Hadoop eco system and relational databases.
2. What are the relational databases supported in Sqoop?
Below are the list of RDBMSs that are supported by Sqoop Currently.
- Microsoft SQL
- IBM’s Netezza
3. What are the destination types allowed in Sqoop Import command?
Currently Sqoop Supports data imported into below services.
4. Is Sqoop similar to distcp in hadoop?
Partially yes, hadoop’s distcp command is similar to Sqoop Import command. Both submits parallel map-only jobs but distcp is used to copy any type of files from Local FS/HDFS to HDFS and Sqoop is for transferring the data records only between RDMBS and Hadoop eco system services, HDFS, Hive and HBase.
5. What are the majorly used commands in Sqoop?
In Sqoop Majorly Import and export commands are used. But below commands are also useful some times.
6. When Importing tables from MySQL to what are the precautions that needs to be taken care w.r.t to access?
In MySQL, we need to make sure that we have granted all privileges on the databases, that needs to be accessed, should be given to all users at destination hostname. If Sqoop is being run under localhost and MySQL is also present on the same then we can grant the permissions with below two commands from MySQL shell logged in with ROOT user.
7. What if my MySQL server is running on MachineA and Sqoop is running on MachineB for the above question?
From MachineA login to MySQL shell and perform the below command as root user. If using hostname of second machine, then that should be added to /etc/hosts file of first machine.
8. How Many Mapreduce jobs and Tasks will be submitted for Sqoop copying into HDFS?
For each sqoop copying into HDFS only one mapreduce job will be submitted with 4 map tasks. There will not be any reduce tasks scheduled.
9. How can we control the parallel copying of RDBMS tables into hadoop ?
We can control/increase/decrease speed of copying by configuring the number of map tasks to be run for each sqoop copying process. We can do this by providing argument -m 10 or –num-mappers 10 argument to sqoop import command. If we specify -m 10 then it will submit 10 map tasks parallel at a time. Based on our requirement we can increase/decrease this number to control the copy speed.
10. What is the criteria for specifying parallel copying in Sqoop with multiple parallel map tasks?
To use multiple mappers in Sqoop, RDBMS table must have one primary key column (if present) in a table and the same will be used as split-by column in Sqoop process. If primary key is not present, we need to provide any unique key column or set of columns to form unique values and these should be provided to -split-by column argument.
11. While loading tables from MySQL into HDFS, if we need to copy tables with maximum possible speed, what can you do ?
We need to use –direct argument in import command to use direct import fast path and this –direct can be used only with MySQL and PostGreSQL as of now.
12. What is the example connect string for Oracle database to import tables into HDFS?
We need to use Oracle JDBC Thin driver while connecting to Oracle database via Sqoop. Below is the sample import command to pull table employees from oracle database testdb.
13. While connecting to MySQL through Sqoop, I am getting Connection Failure exception what might be the root cause and fix for this error scenario?
This might be due to insufficient permissions to access your MySQL database over the network. To confirm this we can try the below command to connect to MySQL database from Sqoop’s client machine.
If this is the case then we need grant permissions user @ sqoop client machine as per the answer to Question 6 in this post.
14. While importing tables from Oracle database, Sometimes I am getting java.lang.IllegalArgumentException: Attempted to generate class with no columns! or NullPointerException what might be the root cause and fix for this error scenario?
While dealing with Oracle database from Sqoop, Case sensitivity of table names and user names matters highly. Most probably by specifying these two values in UPPER case will solve the issue unless actual names are mixed with Lower/Upper cases. If these are mixed, then we need to provide them within double quotes.
In case, the source table is created under different user namespace, then we need to provide table name as USERNAME.TABLENAME as shown below.
[Read Next Page]