Hadoop provides Writable interface based data types for serialization and de-serialization of data storage in HDFS and mapreduce computations.
Serialization is the process of converting object data into byte stream data for transmission over a network across different nodes in a cluster or for persistent data storage.
Deserialization is the reverse process of serialization and converts byte stream data into object data for reading data from HDFS. Hadoop provides Writables for serialization and deserialization purpose.
Writable and WritableComparable Interfaces
To provide mechanisms for serialization and deserialization of data, Hadoop provided two important interfaces Writable and WritableComparable. Writable interface specification is as follows:
WritableComparable interface is sub-interface of Hadoop’s Writable and Java’s Comparable interfaces. and its specification is shown below:
The standard java.lang.Comparable Interface contains single method compareTo() method for comparing the operators passed to it.
The compareTo() method returns -1 , 0 , or 1 depending on whether the compared object is less than, equal to, or greater than the current object.
The above two interfaces are provided in org.apache.hadoop.io package
Constraints on Key-values in Mapreduce
Hadoop data types used in Mapreduce for key or value fields must satisfy two constraints.
- Any data type used for a Value field in mapper or reducer input/output must implement Writable Interface.
- Any data type used for a Key field in mapper or reducer input/output must implement WritableComparable interface along with Writable interface to compare the keys of this type with each other for sorting purposes.
Writable Classes – Hadoop Data Types
Hadoop provides classes that wrap the Java primitive types and implement the WritableComparable and Writable Interfaces. They are provided in the org.apache.hadoop.io package.
All the Writable wrapper classes have a get() and a set() method for retrieving and storing the wrapped value.
Primitive Writable Classes
These are Writable Wrappers for Java primitive data types and they hold a single primitive value that can be set either at construction or via a setter method.
All these primitive writable wrappers have get() and set() methods to read or write the wrapped value. Below is the list of primitive writable data types available in Hadoop.
In the above list VIntWritable and VLongWritable are used for variable length Integer types and variable length long types respectively.
Serialized sizes of the above primitive writable data types are same as the size of actual java data type. So, the size of IntWritable is 4 bytes and LongWritable is 8 bytes.
Array Writable Classes
Hadoop provided two types of array writable classes, one for single-dimensional and another for two-dimensional arrays. But the elements of these arrays must be other writable objects like IntWritable or LongWritable only but not the java native data types like int or float.
Map Writable Classes
Hadoop provided below MapWritable data types which implement java.util.Map interface
- AbstractMapWritable – This is abstract or base class for other MapWritable classes.
- MapWritable – This is a general purpose map mapping Writable keys to Writable values.
- SortedMapWritable – This is a specialization of the MapWritable class that also implements the SortedMap interface.
Other Writable Classes
NullWritable is a special type of Writable representing a null value. No bytes are read or written when a data type is specified as NullWritable. So, in Mapreduce, a key or a value can be declared as a NullWritable when we don’t need to use that field.
This is a general-purpose generic object wrapper which can store any objects like Java primitives, String, Enum, Writable, null, or arrays.
Text can be used as the Writable equivalent of java.lang.String and It’s max size is 2 GB. Unlike java’s String data type, Text is mutable in Hadoop.
BytesWritable is a wrapper for an array of binary data.
It is similar to ObjectWritable but supports only a few types. User need to subclass this GenericWritable class and need to specify the types to support.
Example Program to Test Writables
Lets write a WritablesTest.java program to test most of the data types mentioned above in this post with get(), set(), getBytes(), getLength(), put(), containsKey(), keySet() methods.