Enable Compression in Hive 1


For data intensive workloads, I/O operation and network data transfer will take considerable time to complete. By Enabling Compression in Hive we can improve the performance Hive Queries and as well as save the storage space on HDFS cluster.

Find Available Compression Codecs in Hive

To enable compression in Hive, first we need to find out the available compression codes on hadoop cluster, and we can use below set command to list down the available compression codecs.

Enable Compression on Intermediate Data

A complex Hive query is usually converted to a series of multi-stage MapReduce jobs after submission, and these jobs will be chained up by the Hive engine to complete the entire query. So “intermediate output” here refers to the output from the previous MapReduce job, which will be used to feed the next MapReduce job as input data.

We can enable compression on Hive Intermediate output by setting the property hive.exec.compress.intermediate either from Hive Shell using set command or at site level in hive-site.xml file.

Or we can set these properties in hive shell as shown below with set commands.

Enable Compression on Final Output

We  can enable compression on final output in hive shell by setting below properties.

or

Example Table Creation with Compression Enabled

In the below shell snippet we are creating a new table compressed_emp from existing testemp table in hive after setting the compression properties to true in the hive shell.

Source Table: testemp contents

Setting Compression properties in Hive Shell:

Target Table compressed_emp Creation:

Thus we can create the output files in gzipped format and we can view the contents of this file with dfs -text command.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.