Sqoop Hive Use Case Example 2


This is another Use case on Sqoop, Hive concepts. Hive Use Case Example.

Hive Use Case Example

Problem Statement

There are about 35,000 crime incidents that happened in the city of San Francisco in the last 3 months.

Our task is to store this relational data in an RDBMS. Use Sqoop to import it into Hadoop.

Can we answer the following queries on this data:

  •   Relative frequencies of different types of crime incidents
  •   Crime occurrence frequency as a function of day of the week

https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry

Step-1:

Download the data into .csv format

Step-2:

Copy the file into hdfs, as shown as below command,

Step-3:

if you observe the data,

 Here,

column is surrounded by quotes and in between we have commas. so while exporting that data into RDBMS we need to remove that quotes and we should not consider comma inside ‘description’ as delimiter. This can be achieved with the help of CSVSerde in hive. Below is the high level notes on it.

To Setup the CSVSerde, do the following:

  1. Download the CSV SerDe jar to from https://github.com/ogrodnek/csv-serde.
  2. In Hive CLI, submit “add jar <path-to-serde-jar>".
  3. The list jars command should display the CSV SerDe if it has been added successfully.
  4. To add the CSV SerDe to an existing table:
  5. To add the CSV SerDe to a new table:
  6. The default separator, quote, and escape characters from the opencsv library are:
  7. We can use SERDEPROPERTIES to override default escape, quite and separator characters.  For example, the following overrides the defaults, using the tab character instead of comma for the separator, using a single quote instead of a double quote, and two backslashes for the escape character:

To achieve this we are first creating hive table and loading data into it,

Step-4:

Now we are again copying data from hive table to hdfs to remove commas inside the description specified above.

Step-5:

Now exporting the data into RDBMS using Sqoop,

Step-6:

Final output:

  • Relative frequencies of different types of crime incidents

  • Crime occurrence frequency as a function of day of the week


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Sqoop Hive Use Case Example

  • Ajay

    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
    Caused by: java.lang.NumberFormatException: For input string: ” -122.421970919038)””
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:481)
    at java.


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.