Hive Architecture 1


Hive Architecture:

Below is the hive level architecture of Hive:

hive-arch

 

In Hive distribution, we can find the below components majorly.

  • CLI        Command Line Interface. It is the most common way of interacting with Hive. (Hive shell) This is the default service.
  • HWI      — Hive Web Interface. It is an alternative to the shell for interacting with hive through web browser.
  • JDBC/ODBC/Thrift Server —  These are provide programmatic access to Hive server. Applications using Thrift, JDBC, and ODBC connectors need to run a Hive server to communicate with Hive. HIVE_PORT environment variable need to be specified with the available port(defaults to 10,000) number to let the server listen on. 
  • Driver — Driver compiles the input commands and queries, optimizes the
    computation required, and executes the required steps with MapReduce jobs.
  • Metastore — The metastore is the central repository of Hive metadata. The metastore is divided into two pieces: a service and the backing store for the data. By default, the metastore is run in the same process as the Hive service.  Using this
    service, it is possible to run the metastore as a standalone (remote) process. Set the
    METASTORE_PORT environment variable to specify the port the server will listen on
  • Job Tracker — Hive communicates with the Job Tracker to initiate the MapReduce jobs. Hive does not have to be running on the same master node with the JobTracker.
  • NamenodeThe data files to be processed are in HDFS, which is managed by the NameNode
Hive clients:

Below are the three main clients that can interact with Hive Architecture.

Thrift Client:   Hive Thrift Client can run Hive commands from a wide range of programming languages. Thrift bindings for Hive are available for Java, Python, and Ruby.

JDBC Driver:  Hive provides a JDBC driver, defined in the class
org.apache.hadoop.hive.jdbc.HiveDriver. When configured with a JDBC URI of
the form jdbc:hive://host:port/dbname , a Java application will connect to a Hive
server running in a separate process at the given host and port.

We may alternatively choose to connect to Hive through JDBC in embedded mode using the URI jdbc:hive://. In this mode, Hive runs in the same JVM as the application
invoking it, so there is no need to launch it as a standalone server since it does not
use the Thrift service or the Hive Thrift Client.

ODBC Driver:   Hive ODBC Driver allows applications that support the ODBC protocol to connect to Hive. Like the JDBC driver, the ODBC driver uses Thrift to communicate with the Hive server.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.