Hive Use case example for JSON Data 1


Hive Use case example with US government web sites data

Click here to download example data to analyze —> UsaGovData

The data present in the above file is JSON Format and its JSON Schema is as shown below,

Note: If you copy the text file into LFS make sure that you do not have any empty lines at the end of the file otherwise you will encounter below exception

Case Study

Click stream Analysis

Click streams obtained from US government web sites are available in the above data file. Our task is to store this data on Hadoop and compute the following analytics:

  • The top 10 most popular sites in terms of clicks.
  • The top-10 most popular sites for each country
  • Top-10 most popular sites for each month

Solution

As this data is in JSON Format so we need to Download JSON Serde. And this can be downloaded from the Hive JSON Serde Download Link. Add this JSON serde to class path as shown below in Hive Shell,

1. Create a hive table based on data format

2. Load data into the table

3. For query tuning we are partitioning the above table and loading the data into it

Query 1:

4. Now the top 10 sites in terms of clicks is

5. And the top 10 most popular sites for each country will be

Next as we need month for the third query i have created the below MonthUDF using java which will take timestamp as input and return month as output.

Next i have added the above JAR to hive and created temporary function month as follows

Next i have created new table clickpartition with extra field month and also partitioned the table by country so that the quering will be fast.

Next i have copied the data to the new table clickpartition from old table clickstr using dynamic partition.

Query1:

The top 10 most popular sites in terms of clicks.

Query2:

The top-10 most popular sites for each country

For the above query we need to list top 10 sites in each country group which requires some extra code. So i have written Rank UDF as below.

Query3:

Top-10 most popular sites for each month.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Hive Use case example for JSON Data


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.