Hive Metastore Configuration 4


Hive Metastore:

  • In Hive, metastore is the central repository to store metadata for hive tables/partitions. Any datastore that has a JDBC driver can be used as a metastore.
  • By default, the metastore service runs in the same JVM as the Hive service and contains an embedded Derby database instance backed by the local disk.
  • There are 3 different ways to setup the metastore server using different Hive configurations:
    • Embedded Metastore
    • Local Metastore
    • Remote Metastore
  • Embedded metastore: Derby is the default database for the embedded metastore. It is a simple way to get started with Hive. But we can only have one Hive session open at a time that shares the same metastore. Trying to start a second session gives the error:

Failed to start database ‘metastore_db’

So, embedded metastore is not practical solution but can be used for unit tests.

  • Local Metastore: In local metastore, a standalone database (MySQL or PostgreSQL or any other db with JDBC driver) is used as a metastore.

In this mode, multiple users can open connection to metastore and make SQL queries against it.

  • Remote Metastore: In remote metastore, all Hive Clients will make a connection to a metastore server (MySQL or PostgreSQL or Oracle) which in turn queries the datastore. In this, metastore server run in a separate process to the Hive service.

Below is the high level architecture of three different Hive metastores:

hive metastore

Below are the configuration properties and their default values which will control the above metastore setups.

CONFIGURATION PROPERTY DEFAULT VALUE DESCRIPTION
javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=metastore_db;create=true Derby database located at current working directory
javax.jdo.option.ConnectionDriverName org.apache.derby.jdbc.EmbeddedDriver Derby embeded JDBC driver class.
hive.metastore.warehouse.dir /user/hive/warehouse location of database for the warehouse
hive.metastore.uris Blank Thrift URI for the remote metastore. Blank for local mode.

Note:

  • Here in the above default settings, in javax.jdo.option.ConnectionURL property, databaseName=metastore_db;create=true, denotes that metastore_db will be created when it is not present in the current working directory.
  • That’s why, whenever we try to run hive service in a different working directory, it will create a metastore_db in that folder as well and no tables from previous session will be available.
  • if we specify the complete path in databaseName field like file:///usr/lib/hive/db then metastore_db folder will be created in that location and the same will be accessed every time when we start hive service. So, previous tables and partitions can be accessed even from other working directories.
To Maintain a single copy of Embedded Metastore:

Add the below property to hive-site.xml file.

Configuring Local Metastore:

Below are the configuration details for setting Local Metastore with separate MySQL database. If MySQL is not installed already, please follow the instructions from the post to install MySQL on Ubuntu.

Note: The JDBC driver JAR file for MySQL (Connector/J) must be on Hive’s classpath, which is simply achieved by placing it in Hive’s lib directory. MySQL Connector/J jar file can be downloaded from MySQL site.

Add the below properties to hive-site.xml file.