Hive Use case example for JSON Data 1


Hive Use case example with US government web sites data

Click here to download example data to analyze —> UsaGovData

The data present in the above file is JSON Format and its JSON Schema is as shown below,

Note: If you copy the text file into LFS make sure that you do not have any empty lines at the end of the file otherwise you will encounter below exception

Case Study

Click stream Analysis

Click streams obtained from US government web sites are available in the above data file. Our task is to store this data on Hadoop and compute the following analytics:

  • The top 10 most popular sites in terms of clicks.
  • The top-10 most popular sites for each country
  • Top-10 most popular sites for each month

Solution

As this data is in JSON Format so we need to Download JSON Serde. And this can be downloaded from the Hive JSON Serde Download Link. Add this JSON serde to class path as shown below in Hive Shell,

1. Create a hive table based on data format

2. Load data into the table

3. For query tuning we are partitioning the above table and loading the data into it

Query 1:

4. Now the top 10 sites in terms of clicks is