HCatalog and Pig Integration 3


HCatalog and Pig Integration

In short, HCatalog opens up the hive metadata to other mapreduce tools. Every mapreduce tools has its own notion about HDFS data (example Pig sees the HDFS data as set of files, Hive sees it as tables). With having table based abstraction, HCatalog supported mapreduce tools do not need to care about where the data is stored, in which format and storage location (HBase or HDFS).

We do get the facility of WebHcat to submit jobs in an RESTful way if you configure webhcat along Hcatalog. In this post we will see HCatalog and Pig Integration and loading and storing of Hive tables using Pig via HCatalog.

Set Up:

The HCatLoader and HCatStorer interfaces are used with Pig scripts to read and write data in HCatalog-managed tables. No HCatalog-specific setup is required for these interfaces.

Note: HCatalog is not thread safe.

Load data from hive to pig:

HCatLoader:

Using HcatLoader, we can load the data from hive to pig. The basic syntax is shown below,

The Data types that are supported with Hcatalog and Pig are listed below as of Hive-0.14

Pig does not automatically pick up HCatalog jars. To bring in the necessary jars, start pig session with below option.

Or we can pass the below required jars in our command line as shown below:

HCatStorer:

HCatStorer is used with Pig scripts to write data to Hcatalog/Hive managed tables.

We can directly store into a partitioned table also, as shown below.

To achieve the dynamic partitioning – To write into multiple partitions at once, make sure that the partition column is present in our data, then call HCatStorer with no argument as shown below.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

3 thoughts on “HCatalog and Pig Integration

  • aashish soni

    hi Shiva , i was trying to load data from hive table to pig

    pig -x -useHCatalog

    it always say “Please initialize HIVE_HOME”

    i also tried

    pig -x -useHCatalog /home/mohandas/pig-0.11.1/bin/pig -Dpig.additional.jars=\
    > $HCAT_HOME/share/hcatalog/hcatalog-core*.jar:\
    > $HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
    > $HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
    > $HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
    > $HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/lib/slf4j-api-*.jar

    can you please guide me where i am wrong??

     

     

  • Suchintak Patnaik

    Do we have to manage Hive table using HCatalog prior to using that data in Pig?

    Or, we can simply create a hive table, load data and access that data in Pig using HCatalog?


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017