Below are some the of important hive Interview Questions and Answers for experienced hadoop developers.
Hive Interview Questions and Answers for experienced
1. What is the Hive configuration precedence order?
There is a precedence hierarchy to setting properties. In the following list, lower numbers take precedence over higher numbers:
- The Hive SET command
- The command line -hiveconf option
- hadoop-site.xml (or, equivalently, core-site.xml, hdfs-site.xml, and mapred-site.xml)
- hadoop-default.xml (or, equivalently, core-default.xml, hdfs-default.xml, and mapred-default.xml)
2. How do change settings within Hive Session?
We can change settings from within a session, too, using the SET command. This is useful for changing Hive or MapReduce job settings for a particular query. For example, the following command ensures buckets are populated according to the table definition.
hive> SET hive.enforce.bucketing=true;
To see the current value of any property, use SET with just the property name:
hive> SET hive.enforce.bucketing;
By itself, SET will list all the properties and their values set by Hive. This list will not include Hadoop defaults, unless they have been explicitly overridden in one of the ways covered in the above answer. Use SET -v to list all the properties in the system, including Hadoop defaults.
3. How to print header on Hive query results?
We need to use following set command before our query to show column headers in STDOUT.
hive> set hive.cli.print.header=true;
4. How to get detailed description of a table in Hive?
Use below hive command to get a detailed description of a hive table.
hive> describe extended <tablename>;
5. How to access sub directories recursively in Hive queries?
To process directories recursively in Hive, we need to set below two commands in hive session. These two parameters work in conjunction.
hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;
Now hive tables can be pointed to the higher level directory. This is suitable for a scenario where the directory structure is as following: /data/country/state/city
6. How to skip header rows from a table in Hive?
Suppose while processing some log files, we may find header records.
Like above, It may have 3 lines of headers that we do not want to include in our Hive query. To skip header lines from our tables in Hive we can set a table property that will allow us to skip the header lines.
CREATE EXTERNAL TABLE userdata (
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE
7. Is it possible to create multiple table in hive for same data?
As hive creates schema and append on top of an existing data file. One can have multiple schema for one data file, schema will be saved in hive’s metastore and data will not be parsed or serialized to disk in given schema. When we will try to retrieve data, schema will be used. For example if we have 5 column (name, job, dob, id, salary) in the data file present in hive metastore then, we can have multiple schema by choosing any number of columns from the above list. (Table with 3 columns or 5 columns or 6 columns).
But while querying, if we specify any column other than above list, will result in NULL values.
8. What is the maximum size of string data type supported by Hive?
Maximum size is 2 GB.
9. What are the Binary Storage formats supported in Hive?
By default Hive supports text file format, however hive also supports below binary formats.
Sequence Files, Avro Data files, RCFiles, ORC files, Parquet files
Sequence files: General binary format. splittable, compressible and row oriented. a typical example can be. if we have lots of small file, we may use sequence file as a container, where file name can be a key and content could stored as value. it support compression which enables huge gain in performance.
Avro datafiles: Same as Sequence file splittable, compressible and row oriented except support of schema evolution and multilingual binding support.
RCFiles: Record columnar file, it’s a column oriented storage file. it breaks table in row split. in each split stores that value of first row in first column and followed sub subsequently.
ORC Files: Optimized Record Columnar files
10. is HQL case sensitive?
HQL is not case sensitive.
11. Describe CONCAT function in Hive with Example?
CONCAT function will concatenate the input strings. We can specify any number of strings separated by comma.
Example: CONCAT (‘Hive’,’-‘,’is’,’-‘,’a’,’-‘,’data warehouse’,’-‘,’in Hadoop’);
Output: Hive-is-a-data warehouse-in Hadoop
So, every time we delimit the strings by ‘-‘. If it is common for all the strings, then Hive provides another command CONCAT_WS. Here you have to specify the delimit operator first.
Syntax: CONCAT_WS (‘-‘,’Hive’,’is’,’a’,’data warehouse’,’in Hadoop’);
Output: Hive-is-a-data warehouse-in Hadoop
12. Describe REPEAT function in Hive with example?
REPEAT function will repeat the input string n times specified in the command.
13. Describe REVERSE function in Hive with example?
REVERSE function will reverse the characters in a string.
14. Describe TRIM function in Hive with example?
TRIM function will remove the spaces associated with a string.
Example: TRIM(‘ Hadoop ‘);
If we want to remove only leading or trailing spaces then we can specify the below commands respectively.
15. Describe RLIKE in Hive with an example?
RLIKE (Right-Like) is a special function in Hive where if any substring of A matches with B then it evaluates to true. It also obeys Java regular expression pattern. Users don’t need to put % symbol for a simple match in RLIKE.
Examples: ‘Express’ RLIKE ‘Exp’ –> True
‘Express’ RLIKE ‘^E.*’ –> True (Regular expression)
Moreover, RLIKE will come handy when the string has some spaces. Without using TRIM function, RLIKE satisfies the required scenario. Suppose if A has value ‘Express ‘ (2 spaces additionally) and B has value ‘Express’. In these situations, RLIKE will work better without using TRIM.
‘Express ‘ RLIKE ‘Express’ –> True
Note: RLIKE evaluates to NULL if A or B is NULL.