In This post, we will discuss about basic details about google’s snappy compression technique.
Snappy Compression Introduction:
- Snappy is one of the fast compression/decompression tools. It formerly known as Zippy.
- Snappy is written in C++, and is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems.
- It can be used in open-source projects like Cassandra, Hadoop and Lucene etc.
- It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.
Snappy compresses at about 250 MB/sec or more and decompresses at
about 500 MB/sec or more.
- Snappy encoding is not bit-oriented, but byte-oriented (only whole bytes are emitted or consumed from a stream).
- Snappy is distributed through Google Code –> http://code.google.com/p/snappy/
Snappy Installation on Ubuntu:
1. g++, make and build-essential packages should have been installed prior to installation of snappy. For Installing these packages issue the below commands in the same order.
2. Download snappy tar ball from http://code.google.com/p/snappy/ , unpack it and navigate into that folder.
3. Issue below commands from terminal in the same order from the folder of snappy.
If the above install command works fine, snappy will get installed into /usr/local/lib directory automatically.
Snappy Compression Configuration For Hadoop:
Generally Hadoop expects snappy shared libraries to be available in native library ($HADOOP_HOME/lib/native/)
If we didn’t find any libsnappy.so* files in hadoop native library then we need to find whether our hadoop distribution is not built up with snappy integration by default or not.
This can be verified by pushing the below sample .snappy file into hadoop directory and trying to browse that file through hadoop’s fs -text command.
Sample snappy file –> sample.snappy
Download the above sample.snappy file, unzip it and put into HDFS.
If we receive an error message as shown below,
I.e. Snappy is not installed/configured properly on hadoop.
After installation of snappy on Ubuntu, usually we need to copy the libsnappy*.so* files from /usr/local/lib into $HADOOP_HOME/lib/native/ library location and need to set LD_LIBRARY_PATH , JAVA_LIBRARY_PATH environment variables to hadoop native library. If this approach works fine, well and good.
Otherwise the best and simple method is that, now a days, latest hadoop distributions (starting from hadoop-2.5.0) are coming with snappy installed already. Even if we are using older version of hadoop, we can download hadoop-2.5.0 from apache download mirrors and copy all lib*.* files from hadoop-2.5.0/lib/native/ directory into our hadoop native library $HADOOP_HOME/lib/native/ location.
Below are the list of these files.
and define these two environment variables in .bashrc file.