Below are some of the important Hive Interview Questions and Answers required for Hadoop developers and administrators.
Hive Interview Questions and Answers
1. What is Metadata?
Data about Data.
2. What is Hive?
Hive is one of the important tool in Hadoop eco system and it provides an SQL like dialect to Hadoop distributed file system.
3. What are the features of Hive?
- Tools to enable easy data extract/transform/load (ETL)
- A mechanism to project structure on a variety of data formats
- Access to files stored either directly in HDFS or other data storage systems as HBase
- Query execution through MapReduce jobs.
- SQL like language called HiveQL that facilitates querying and managing large data sets residing in hadoop.
4. What are the limitations of Hive?
Below are the limitations of Hive:
- Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.
- Hive does not provide record-level update, insert, nor delete.
- Hive queries have higher latency than SQL queries, because of start-up overhead for MapReduce jobs submitted for each hive query.
- As Hadoop is a batch-oriented system, Hive doesn’t support OLTP (Online Transaction Processing).
- Hive is close to OLAP (Online Analytic Processing) but not ideal since there is significant latency between issuing a query and receiving a reply, both due to the overhead of Mapreduce jobs and due to the size of the data sets Hadoop was designed to serve.
- If we need OLAP, we need to use NoSQL databases like HBase that can be integrated with Hadoop.
5. What is the differences Between Hive and HBase?
Hive is not a database but a data warehousing frame work. Hive doesn’t provide record level operations on tables.
- HBase is a NoSQL Database and it provides record level updates, inserts and deletes to the table data.
- HBase doesn’t provide a query language like SQL, but Hive is now integrated with
6. What is Hive Metastore?
The metastore is the central repository of Hive metadata. The metastore is divided into two pieces: a service and the backing store for the data. By default, the metastore is run in the same process as the Hive service. Using this service, it is possible to run the metastore as a standalone (remote) process. Set the METASTORE_PORT environment variable to specify the port the server will listen on.
7. Wherever (Different Directory) we run hive query, it creates new metastore_db, please explain the reason for it?
Whenever we run the hive in embedded mode, it creates the local metastore. And
before creating the metastore it looks whether metastore already exist or not. This property is defined in configuration file hive-site.xml.
Property is “javax.jdo.option.ConnectionURL" with default value “jdbc:derby:;databaseName=metastore_db;create=true".
So to change the behavior change the location to absolute path, so metastore will be used from that location.
8. What are the different types of Hive Metastore?
Below are three different types of metastore.
- Embedded Metastore
- Local Metastore
- Remote Metastore
9. What is the default Hive warehouse directory?
It is /user/hive/warehouse directory in local file system.
10. How to start Hive Thrift server?
We can issue below command from terminal to start Hive thrift server.
$ hive –service hiveserver