In this post, we will discuss about a few more hadoop hive interview questions and answers for hadoop freshers and experienced developers.
Hive Interview Questions and Answers
1. What are the types of tables in Hive?
There are two types of tables.
- Managed tables
- External tables
Only while dropping tables these two differentiates. Otherwise both type of tables are very similar.
2. What kind of data warehouse application is suitable for Hive?
Hive is not a full database. The design constraints and limitations of Hadoop and HDFS
impose limits on what Hive can do.
Hive is most suited for data warehouse applications, where
- Relatively static data is analyzed,
- Fast response times are not required, and
- When the data is not changing rapidly.
3. Does Hive provide OLTP or OLAP?
Hive doesn’t provide crucial features required for OLTP, Online Transaction Processing.
It’s closer to being an OLAP tool, Online Analytic Processing. So, Hive is best suited for
data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.
4. Does Hive support record level Insert, delete or update?
No. Hive does not provide record-level update, insert, or delete. Henceforth, Hive does not
provide transactions too. However, users can go with CASE statements and built in functions of Hive to satisfy the above DML operations. Thus, a complex update query in
a RDBMS may need many lines of code in Hive.
5. How can we change a column data type in Hive?
We can use below command to alter data type of a column in hive.
Example: If we want to change the data type of empid column from integer to bigint in a
table called employee.
6. How can we copy the columns of a hive table into a file?
By using awk command in shell, the output from HiveQL Describe command can be written to a file.
7. How to rename a table in Hive?
Using ALTER command with RENAME, we can rename a table in Hive.
8. Is there any alternative way to rename a table without ALTER command?
By using Import and export options we can be rename a table as shown below. Here we are saving the hive data into HDFS and importing back to new table like below.
If we prefer to just preserve the data, we can create a new table from old table like
9. What is the difference between order by and sort by in hive?
- SORT BY will sort the data within each reducer. We can use any number of reducers
for SORT BY operation.
- ORDER BY will sort all of the data together, which has to pass through one reducer.
Thus, ORDER BY in hive uses single reducer.
- ORDER BY guarantees total order in the output while SORT BY only guarantees
ordering of the rows within a reducer. If there is more than one reducer, SORT BY may give partially ordered final results
10. What is Double data type in Hive?
Double data type in Hive will present the data differently unlike RDBMS.
See the double type data below:
E4 represents 10^4 here. So, the value1.28893E4 represents 12889.3. All the
calculations will be accurately performed using double type
It is crucial while exporting the double type data to any RDBMS since the type may be
wrongly interpreted. So, it is advised to cast the double type into appropriate type before