In this post, we will discuss about hive Interview Questions and Answers for experienced and freshers.
Hive Interview Questions and Answers for experienced:
1. How to start Hive metastore service as a background process?
We can start hive metastore service as a background process with below command.
By using kill -9 <process id> we can stop this service.
2. How to configure hive remote metastore in hive-site.xml file?
We can configure remote metastore in hive-site.xml file with the below property.
3. What is the need for partitioning in Hive?
Partitioning is mainly intended for quick turn around time for queries on hive tables.
4. We have already 3 tables named US,UK,IND in Hive. Now we have one more JPN created using hadoop fs -mkdir JPN. Can we move the content in IND to JPN directly?
Yes, we can copy contents from hive warehouse directory table IND into JPN.
5. Now we have to display the contents in US,UK,IND,JPN. By using SELECT * FROM TABLES is it possible to display?
No, Because JPN is created by using fs -mkdir command. It is not part of metadata.
6. Is it possible to use same metastore by multiple users, in case of embedded hive?
No, it is not possible to use metastore in sharing mode. It is recommended to use
standalone “real" database like MySQL or PostGreSQL.
7. What is HCatalog and how to use it?
HCatalog is a Table and Storage Management tool to Hadoop/HDFS. In MR, we use it by specifying InputOutput Formats i.e. HCatInputFormat and HCatOutputFormat.
In Pig, we use it by specifying Storage types i.e HCatLoader and HCatStorer.
8. If we run hive as a server, what are the available mechanisms for connecting it from application?
Below are following ways by which we can connect with the Hive Server:
Thrift Client: Using thrift we can call hive commands from a various programming
languages e.g: Java, PHP, Python and Ruby.
- JDBC Driver : It supports the Type 4 (pure Java) JDBC Driver
- ODBC Driver: It supports ODBC protocol.
9. Is multi line comment supported in Hive Script ?
10. What is SerDe in Apache Hive?
A SerDe is a Serializer Deserializer. Hive uses SerDe to read and write data from tables. An important concept behind Hive is that it DOES NOT own the Hadoop File System (HDFS) format that data is stored in. Users are able to write files to HDFS with whatever tools/mechanism takes their fancy(“CREATE EXTERNAL TABLE” or “LOAD DATA INPATH,” ) and use Hive to correctly “parse” that file format in a way that can be used by Hive. A SerDe is a powerful and customizable mechanism that Hive uses to “parse” data stored in HDFS to be used by Hive.
11. Which classes are used by the Hive to Read and Write HDFS Files?
Following classes are used by Hive to read and write HDFS files
- TextInputFormat/HiveIgnoreKeyTextOutputFormat: These 2 classes read/write data in plain text file format.
- SequenceFileInputFormat/SequenceFileOutputFormat: These 2 classes read/write data in hadoop SequenceFile format.
12. What are the examples of the SerDe classes which hive uses to Serialize and Deserialize data?
Hive currently use below SerDe classes to serialize and deserialize data:
- MetadataTypedColumnsetSerDe: This SerDe is used to read/write delimited records like CSV, tab-separated control-A separated records (quote is not supported yet.)
- ThriftSerDe: This SerDe is used to read/write thrift serialized objects. The class file for the Thrift object must be loaded first.
- DynamicSerDe: This SerDe also read/write thrift serialized objects, but it understands thrift DDL so the schema of the object can be provided at run time. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol
13. How do we write our own custom SerDe ?
In most cases, users want to write a Deserializer instead of a SerDe, because users just want to read their own data format instead of writing to it.
For example, the RegexDeserializer will deserialize the data using the configuration
parameter ‘regex’, and possibly a list of column names
- If your SerDe supports DDL (basically, SerDe with parameterized columns and column types), you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe from scratch.
- The reason is that the framework passes DDL to SerDe through”thrift DDL” format, and it’s non-trivial to write a “thrift DDL” parser.
14. What is the functionality of Query Processor in Apache Hive ?
This component implements the processing framework for converting SQL to a graph of map/reduce jobs and the execution time framework to run those jobs in the order of dependencies.
15. What is ObjectInspector functionality ?
Hive uses ObjectInspector to analyze the internal structure of the row object and also
the structure of the individual columns. ObjectInspector provides a uniform way to access complex objects that can be stored in multiple formats in the memory, including:
- Instance of a Java class (Thrift or native Java)
- A standard Java object (we use java.util.List to represent Struct and Array, and use
java.util.Map to represent Map)
- A lazily-initialized object (For example, a Struct of string fields stored in a single Java string object with starting offset for each field)
- A complex object can be represented by a pair of ObjectInspector and Java Object.
- The ObjectInspector not only tells us the structure of the Object, but also gives us ways to access the internal fields inside the Object.