For data intensive workloads, I/O operation and network data transfer will take considerable time to complete. By Enabling Compression in Hive we can improve the performance Hive Queries and as well as save the storage space on HDFS cluster.
Find Available Compression Codecs in Hive
To enable compression in Hive, first we need to find out the available compression codes on hadoop cluster, and we can use below set command to list down the available compression codecs.
Enable Compression on Intermediate Data
A complex Hive query is usually converted to a series of multi-stage MapReduce jobs after submission, and these jobs will be chained up by the Hive engine to complete the entire query. So “intermediate output" here refers to the output from the previous MapReduce job, which will be used to feed the next MapReduce job as input data.
We can enable compression on Hive Intermediate output by setting the property hive.exec.compress.intermediate either from Hive Shell using set command or at site level in hive-site.xml file.
Or we can set these properties in hive shell as shown below with set commands.
Enable Compression on Final Output
We can enable compression on final output in hive shell by setting below properties.
Example Table Creation with Compression Enabled
In the below shell snippet we are creating a new table compressed_emp from existing testemp table in hive after setting the compression properties to true in the hive shell.