In this post, we will discuss about the setup needed for HBase Integration with Hive and we will test this integration with the creation of some test hbase tables from hive shell and populate the contents of it from another hive table and finally verify these contents in hbase table.
Reasons to use Hive on HBase is that a lot of data sitting in HBase due to its usage in a real-time environment, but never used for analysis as there are less connectivity tools to HBase directly.
We will use storage handler mechanism to create hbase tables via hive. HBaseStorageHandler allows Hive DDL for managing table definitions in both Hive metastore and HBase’s catalog simultaneously and consistently.
Setup for HBase Integration with Hive:
For setting up of HBase Integration with Hive, we mainly require a few jar files to be present in $HIVE_HOME/lib or $HBASE_HOME/lib directory. The required jar files are:
Here $HBASE_HOME/lib directory will contain many hbase-*.jar files and below are the list for Hadoop 2 API.
We need to add the paths for above jar files to value of hive.aux.jars.path property in hive-site.xml configuration file.
Verify HBase Integration with Hive:
Lets create a new hbase table via hive shell. To Test the hbase table creations we need Hadoop, Yarn and Hbase daemons to be running.
Below is a sample hbase table creation DDL statements. In this we are creating hbase_table_emp table in Hive and emp table in HBase. This table will contain 3 columns in Hive, key int, name string and role string. These are mapped to two columns name and role belonging to cf1 column family. Here “:key” is specified at the beginning of “hbase.columns.mapping” property which automatically maps to first column (id int) in Hive table.
HBase is a special case here, it has a unique row key map with :key but not all the columns in the table need to be mapped.
Lets verify this table emp in HBase shell and view its metadata.
We can not directly load data into hbase table “emp” with load data inpath hive command. We have to copy data into it from another Hive table. Lets create another test hive table with the same schema as hbase_table_emp and we will insert records into it with hive load data input command.
Lets copy contents into hbase_table_emp table from testemp and verify its contents.