Hive Metastore Configuration 4


Hive Metastore:

  • In Hive, metastore is the central repository to store metadata for hive tables/partitions. Any datastore that has a JDBC driver can be used as a metastore.
  • By default, the metastore service runs in the same JVM as the Hive service and contains an embedded Derby database instance backed by the local disk.
  • There are 3 different ways to setup the metastore server using different Hive configurations:
    • Embedded Metastore
    • Local Metastore
    • Remote Metastore
  • Embedded metastore: Derby is the default database for the embedded metastore. It is a simple way to get started with Hive. But we can only have one Hive session open at a time that shares the same metastore. Trying to start a second session gives the error:

Failed to start database ‘metastore_db’

So, embedded metastore is not practical solution but can be used for unit tests.

  • Local Metastore:  In local metastore, a standalone database (MySQL or PostgreSQL or any other db with JDBC driver) is used as a metastore. 

In this mode, multiple users can open connection to metastore and make SQL queries against it.

  • Remote Metastore:  In remote metastore, all Hive Clients will make a connection to a metastore server (MySQL or PostgreSQL or Oracle) which in turn queries the datastore. In this, metastore server run in  a separate process to the Hive service.

Below is the high level architecture of three different Hive metastores:

hive metastore

Below are the configuration properties and their default values which will control the above metastore setups.

CONFIGURATION PROPERTY DEFAULT VALUE      DESCRIPTION
javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=metastore_db;create=true     Derby database located at current working directory
javax.jdo.option.ConnectionDriverName org.apache.derby.jdbc.EmbeddedDriver                       Derby embeded JDBC driver class.
hive.metastore.warehouse.dir /user/hive/warehouse                                              location of database for the warehouse
hive.metastore.uris Blank   Thrift URI for the remote metastore. Blank for local mode.

Note:

  • Here in the above default settings, in javax.jdo.option.ConnectionURL property,  databaseName=metastore_db;create=true, denotes that metastore_db  will be created when it is not present in the current working directory.
  • That’s why, whenever we try to run hive service in a different working directory, it will create a metastore_db in that folder as well and no tables from previous session will be available.
  • if we specify the complete path in databaseName field like file:///usr/lib/hive/db  then metastore_db folder will be created in that location and the same will be accessed every time when we start hive service. So, previous tables and partitions can be accessed even from other working directories. 
To Maintain a single copy of Embedded Metastore:

Add the below property to hive-site.xml file.

Configuring Local Metastore:

Below are the configuration details for setting Local Metastore with separate MySQL database. If MySQL is not installed already, please follow the instructions from the post to install MySQL on Ubuntu.

Note:  The JDBC driver JAR file for MySQL (Connector/J) must be on Hive’s classpath, which is simply achieved by placing it in Hive’s lib directory. MySQL Connector/J jar file can be downloaded from MySQL site.

Add the below properties to hive-site.xml file.

In the above hive-site.xml file, specify your root user password in place of xxxxxx in javax.jdo.option.ConnectionPassword property.

Start Hive CLI service:

Start hive CLI (command line interface) service with $ hive command on terminal after starting start-dfs.sh daemons we should get hive shell open without any error messages as shown below.

hive start

Start Hive Thrift server:

Start hive thrift server with below command and running service process can be verified with $ jps -lm command.

start hive thrift server

hive server ps

I.e. the above local metastore configuration is successful through standalone MySQL database. Successful start of hive service will create metastore database specified in hive-site.xml in MySQL with root privileges and we can verify the same.

mysql metastore

With this we can say that Hive service with Local Metastore setup is successful.

Start Hive Metastore service:

We can start the metastore service with the below command.

start hive metastore

Hive metastore listens on port 9083 by default and the same can be verified below to test whether metastore started successfully or not.

metastore port

Configure Remote Metastore:

We have successfully configured local metastore in the above section. Suppose if we want to add another node (node2) to the existing cluster and new node should use the same metastore on node1, then we have to setup the hive-site.xml file in node2 in such a way that it should reference metastore present in the remote machine (node1).  

To access the remote metastore from node2 we need to set the below property in hive-site.xml file in node2.

To access the tables created in node1 hive shell, from node2, we need to keep hive metastore service running (with command: hive –service metastore) on node1 and start using hive -cli from node2 after setting hive metastore URI successfully as shown above.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

4 thoughts on “Hive Metastore Configuration

  • prabha

    I’m getting an issue as “metastore_db” is already started when request from second hive context. I’m using the default metastore_db only. Please suggest me

  • Ravi Bhata

    Hi

    I am Ravi Bhatta, i am trying to connect hive database to java program.
    I have managed to connect hive to mysql by configuring hive-site.xml in local mode.
    But i am only able to get the meta data. Please tell me the procedure to connect hive database to java program and get the raw data.


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.