In this post we will provide some practical Sqoop Interview Questions and Answers for experienced hadoop developers.
Table of Contents
- Sqoop Interview Questions and Answers for Experienced
- 1. What is Sqoop?
- 2. What are the relational databases supported in Sqoop?
- 3. What are the destination types allowed in Sqoop Import command?
- 4. Is Sqoop similar to distcp in hadoop?
- 5. What are the majorly used commands in Sqoop?
- 6. When Importing tables from MySQL to what are the precautions that needs to be taken care w.r.t to access?
- 7. What if my MySQL server is running on MachineA and Sqoop is running on MachineB for the above question?
- 8. How Many Mapreduce jobs and Tasks will be submitted for Sqoop copying into HDFS?
- 9. How can we control the parallel copying of RDBMS tables into hadoop ?
- 10. What is the criteria for specifying parallel copying in Sqoop with multiple parallel map tasks?
- 11. While loading tables from MySQL into HDFS, if we need to copy tables with maximum possible speed, what can you do ?
- 12. What is the example connect string for Oracle database to import tables into HDFS?
- 13. While connecting to MySQL through Sqoop, I am getting Connection Failure exception what might be the root cause and fix for this error scenario?
- 14. While importing tables from Oracle database, Sometimes I am getting java.lang.IllegalArgumentException: Attempted to generate class with no columns! or NullPointerException what might be the root cause and fix for this error scenario?
- Share this:
Sqoop Interview Questions and Answers for Experienced
1. What is Sqoop?
Sqoop is an open source tool that enables users to transfer bulk data between Hadoop eco system and relational databases.
2. What are the relational databases supported in Sqoop?
Below are the list of RDBMSs that are supported by Sqoop Currently.
- MySQL
- PostGreSQL
- Oracle
- Microsoft SQL
- IBM’s Netezza
- Teradata
3. What are the destination types allowed in Sqoop Import command?
Currently Sqoop Supports data imported into below services.
- HDFS
- Hive
- HBase
- HCatalog
- Accumulo
4. Is Sqoop similar to distcp in hadoop?
Partially yes, hadoop’s distcp command is similar to Sqoop Import command. Both submits parallel map-only jobs but distcp is used to copy any type of files from Local FS/HDFS to HDFS and Sqoop is for transferring the data records only between RDMBS and Hadoop eco system services, HDFS, Hive and HBase.
5. What are the majorly used commands in Sqoop?
In Sqoop Majorly Import and export commands are used. But below commands are also useful some times.
- codegen
- eval
- import-all-tables
- job
- list-databases
- list-tables
- merge
- metastore
6. When Importing tables from MySQL to what are the precautions that needs to be taken care w.r.t to access?
In MySQL, we need to make sure that we have granted all privileges on the databases, that needs to be accessed, should be given to all users at destination hostname. If Sqoop is being run under localhost and MySQL is also present on the same then we can grant the permissions with below two commands from MySQL shell logged in with ROOT user.
1 2 3 4 |
$ mysql -u root -p mysql> GRANT ALL PRIVILEGES ON *.* TO '%'@'localhost'; mysql> GRANT ALL PRIVILEGES ON *.* TO ''@'localhost'; |
7. What if my MySQL server is running on MachineA and Sqoop is running on MachineB for the above question?
From MachineA login to MySQL shell and perform the below command as root user. If using hostname of second machine, then that should be added to /etc/hosts file of first machine.
1 2 3 4 |
$ mysql -u root -p mysql> GRANT ALL PRIVILEGES ON *.* TO '%'@'MachineB hostname or Ip address'; mysql> GRANT ALL PRIVILEGES ON *.* TO ''@'MachineB hostname or Ip address'; |
8. How Many Mapreduce jobs and Tasks will be submitted for Sqoop copying into HDFS?
For each sqoop copying into HDFS only one mapreduce job will be submitted with 4 map tasks. There will not be any reduce tasks scheduled.
9. How can we control the parallel copying of RDBMS tables into hadoop ?
We can control/increase/decrease speed of copying by configuring the number of map tasks to be run for each sqoop copying process. We can do this by providing argument -m 10 or  –num-mappers 10 argument to sqoop import command. If we specify -m 10 then it will submit 10 map tasks parallel at a time. Based on our requirement we can increase/decrease this number to control the copy speed.
10. What is the criteria for specifying parallel copying in Sqoop with multiple parallel map tasks?
To use multiple mappers in Sqoop, RDBMS table must have one primary key column (if present) in a table and the same will be used as split-by column in Sqoop process. If primary key is not present, we need to provide any unique key column or set of columns to form unique values and these should be provided to -split-by column argument.
11. While loading tables from MySQL into HDFS, if we need to copy tables with maximum possible speed, what can you do ?
We need to use –direct argument in import command to use direct import fast path and this –direct can be used only with MySQL and PostGreSQL as of now.
12. What is the example connect string for Oracle database to import tables into HDFS?
We need to use Oracle JDBC Thin driver while connecting to Oracle database via Sqoop. Below is the sample import command to pull table employees from oracle database testdb.
1 2 3 4 5 6 |
sqoop import \ --connect jdbc:oracle:thin:@oracle.example.com/testdb \ --username SQOOP \ --password sqoop \ --table employees |
13. While connecting to MySQL through Sqoop, I am getting Connection Failure exception what might be the root cause and fix for this error scenario?
This might be due to insufficient permissions to access your MySQL database over the network. To confirm this we can try the below command to connect to MySQL database from Sqoop’s client machine.
1 2 |
$ mysql --host=MySql node> --database=test --user= --password= |
If this is the case then we need grant permissions user @ sqoop client machine as per the answer to Question 6 in this post.
14. While importing tables from Oracle database, Sometimes I am getting java.lang.IllegalArgumentException: Attempted to generate class with no columns! or NullPointerException what might be the root cause and fix for this error scenario?
While dealing with Oracle database from Sqoop, Case sensitivity of table names and user names matters highly. Most probably by specifying these two values in UPPER case will solve the issue unless actual names are mixed with Lower/Upper cases. If these are mixed, then we need to provide them within double quotes.
In case, the source table is created under different user namespace, then we need to provide table name as USERNAME.TABLENAME as shown below.
1 2 3 4 5 6 |
sqoop import \ --connect jdbc:oracle:thin:@oracle.example.com/ORACLE \ --username SQOOP \ --password sqoop \ --table SIVA.EMPLOYEES |
[Read Next Page]
Its very nice series of questions/answers covering all Hadoop framework and its subprojects. very neatly explained and ordered.
Keep going !!
Very Nice explanation of question and answers.
I appreciate your initiation for sharing such useful information.
Thanks a ton!!!
Regards,
Narsi
Thank for this list. I am planning to start learning hadoop.  Hope these questions will be helpful to crack interviews for experienced people.
Great advanced Sqoop interview questions I think all these questions are too important to get a cloudera certification. Please share more interview question especially sqoop interview questions. and it’s advanced options.
Hi Siva,
I am one of your forum regular visitor. Excellent forum.
Keep going.
Thanks,
Siva
Really great effort Siva.
I follow all your pages..
Keep doing it.. thanks a ton
Hi
whether Identity Reducer will run while executing Sqoop import command.
sir talk me what is increamental sqoop
using –incremental append/lastmodified, getting only the latest/updated records
Hi Siva,
Doing a great work for Hadoop Enthusiasists, Visited several Hadoop Interview qn site not seen any unique practical qn like in your site,,Thanks a lot!!
Thanks Sulthan…
Thanks a lot for your inputs, looking for more experienced questions in this category.
Thanks
Kapil
your blog is really nice and informative and it is really exclusive ,i got more information from your article please update this kind of information.
its very nice thanks for sharing good info
Hi,
Can u Pls Tell Me How to Load CSV data into HDFS by using Sqoop
Regards
Mohan kumar.R
Hi Mohan, We can’t ingest CSV file using Sqoop. Only RDBMS is supported but not the files