HBase Integration with Hive

In this post, we will discuss about the setup needed for HBase Integration with Hive and we will test this integration with the creation of some test hbase tables from hive shell and populate the contents of it from another hive table and finally verify these contents in hbase table.

Reasons to use Hive on HBase is that a lot of data sitting in HBase due to its usage in a real-time environment, but never used for analysis as there are less connectivity tools to HBase directly.

We will use storage handler mechanism to create hbase tables via hive. HBaseStorageHandler allows Hive DDL for managing table definitions in both Hive metastore and HBase’s catalog simultaneously and consistently.

Setup for HBase Integration with Hive:

For setting up of HBase Integration with Hive, we mainly require a few jar files to be present in $HIVE_HOME/lib or $HBASE_HOME/lib directory. The required jar files are:

Here $HBASE_HOME/lib directory will contain many hbase-*.jar files and below are the list for Hadoop 2 API.

We need to add the paths for above jar files to value of hive.aux.jars.path property in hive-site.xml configuration file.

Verify HBase Integration with Hive:

Lets create a new hbase table via hive shell. To Test the hbase table creations we need Hadoop, Yarn and Hbase daemons to be running.

Below is a sample hbase table creation DDL statements. In this we are creating hbase_table_emp table in Hive and emp table in HBase. This table will contain 3 columns in Hive, key int, name string and role string. These are mapped to two columns name and role belonging to cf1 column family. Here “:key” is specified at the beginning of “hbase.columns.mapping” property which automatically maps to first column (id int) in Hive table.

HBase is a special case here, it has a unique row key map with :key but not all the columns in the table need to be mapped.

Hbase Table Creation

Lets verify this table emp in HBase shell and view its metadata.

Hbase table

We can not directly load data into hbase table “emp” with load data inpath hive command. We have to copy data into it from another Hive table. Lets create another test hive table with the same schema as hbase_table_emp and we will insert records into it with hive load data input command.

Hive table creation

Lets copy contents into hbase_table_emp table from testemp and verify its contents.

Hive Table Insert

Lets see the contents of emp table from Hbase shell.

Hbase Table Scan

So we have successfully integrated Hbase with Hive and Created & populated new HBase tables from Hive shell.

Note:

If our table size is very big, in order to save the storage space on HDFS, we can delete the testemp table from Hive, after inserting its records into hbase_table_emp table instead of maintaining two copies of same table data.

Mapping Existing HBase Tables to Hive:

Similar to creating new HBase tables, we can also map HBase existing tables to Hive. To give Hive access to an existing HBase table with multiple columns and families, we need to use CREATE EXTERNAL TABLE. But, hbase.columns.mapping is required and it will be validated against the existing HBase table’s column families, whereas hbase.table.name is optional.

For testing this, we will create ‘user’ table in HBase as shown below and map this to Hive table.

HBase Table Operations

Lets create corresponding Hive table for the above ‘user’ Hbase table. Below is the DDL for creation of external table ‘hbase_table_user’.

Verify the contents of hbase_table_user table.

Hive Table Mapping with HBase

So, we have successfully mapped HBase table with Hive External Table.