Hive Installation in Pseudo Distribution Mode 4


In this post, we will discuss about Hive installation on Ubuntu in pseudo distributed mode. We are installing latest release of Hive at the time of writing this, which is hive-0.13.1 version.

HCatalog is included with Hive, starting with Hive release 0.11.0, so optionally, we can setup configuration required for HCatalog as well in our hive version 0.13.1 installation. Even WebHCat is also installed with Hive, starting from Hive release 0.11.0.

Hive Installation on Ubuntu

Prerequisite

  • JDK 1.6 or later versions of Java installed on our Ubuntu machine.
  • Hadoop 1 or Hadoop 2 Installed and Configured properly. HADOOP_HOME environment variable should be set to hadoop’s installation directory.

Hive Installation Procedure

  • Download a latest stable version of Hive, whose version matches with the your existing hadoop version.  Generally, Hive works with the latest release of Hadoop. Hive binary tarballs can be downloaded from Apache download mirrors. 
  • Copy that apache-hive-0.13.1-bin.tar.gz to our preferred hive installation directory, usually into /usr/lib/hive and unpack the tarball. Below are the set of commands to perform these activities.

 And below is the screen shot from the installation terminal.

Hive Installation Commands

  • Set HIVE_HOME, HIVE_CONF_DIR environment variables in .bashrc file as shown below and add the Hive bin directory to PATH environment variable.
  • Optionally, we can set environment variables for HCatalog and WebHCat in .bashrc file. Below is the snap of .bashrc file after setting the above environment variables.

hive bashrc

Configure Hive

With the above installation instructions we can run hive service, but optionally we can set below configuration parameters. All these configuration changes are just recommended but not mandatory to run simple hive service.

  • Create hive-site.xml file if it is not present under HIVE_CONF_DIR  (HIVE_HOME/conf) directory with the below properties.

 below are the detailed descriptions of the above properties.

mapred.reduce.tasksTypically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is “local”. Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.

hive.exec.scratchdir : Hive stores per-query temporary/intermediate data sets under this directory and are normally cleaned up by the hive client when the query is finished.

When writing data to a table or partition, Hive will first write to a temporary location on the target table’s filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table.

hive.metastore.warehouse.dir : HDFS directory location for storing managed tables under hive’s control.

  • By default, Hive metastore runs in an embedded derby database and it allows only one active hive session at a time. So, multiple hive users can’t access hive server at a time. To configure metastore to allow multiple users at a time read through the post Configuring Metastore for Hive.

Verify Hive Installation

With the above changes, basic setup and configuration of hive server is done and now we are ready to check our installation and hive server.

We can verify the hive installation with $ hive –help command or starting default Hive CLI service with $ hive or $ hive –service cli commands as shown below.

hive help

If we receive messages as shown above then our installation is successful, otherwise we need to review the instructions followed once again.

Note:  HiveServer2, introduced in Hive 0.11 has a new CLI called Beeline. To use Beeline, execute $ bin/beeline command in the Hive home directory.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

4 thoughts on “Hive Installation in Pseudo Distribution Mode

  • SureshKumar

    Hi Team,

    I am a newbie to hadoop and i want to explore the things by learning. initially I started using CDH4 in a vm on my windows pc but my system becomes very slow and even eclipse is exiting abnormally.

    I thought of installing all the ecosystem components directly on ubuntu which I like most and very comfortable with.

    your website helped me really well in installing all the eco-system components within 4 hours.

    I got some exceptions while installing latest versions of pig and hive but they got rectified when switch back to earlier versions.

    Finally I am very happy that I installed most of the ecosystem components with the help of your website.

    Thank you,
    SureshhKumar

  • Ayesha

    Hi,

    your website is very helpful. I have around 5 years of experience with ETL development and i have no idea Java programming. can i learn hadoop and seek good career in future.

    Your advise  will be very helpful. 

    Can you please share your email id.

  • raja

    Hello Shiva,

    I am trying to install and configure Hive 2.0.0 with hadoop 2.7.2 in ubunut 15.10 , but i am getting below error , can you please assist .

    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/hive/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

    Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
    Exception in thread “main” java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType …) to create the schema. If needed, don’t forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.