If none of the built-in Hadoop Writable data types matches our requirements some times, then we can create custom Hadoop data type by implementing Writable interface or WritableComparable interface.
Common Rules for creating custom Hadoop Writable Data Type
- A custom hadoop writable data type which needs to be used as value field in Mapreduce programs must implement Writable interface org.apache.hadoop.io.Writable.
- MapReduce key types should have the ability to compare against each other for sorting purposes. A custom hadoop writable data type that can be used as key field in Mapreduce programs must implement WritableComparable interface which intern extends Writable (org.apache.hadoop.io.Writable) and Comparable (java.lang.Comparable) interfaces.
- So, i.e. a data type created by implementing WritableComparable Interface can be used as either key or value field data type.
Since a data type implementing WritableComparable can be used as data type for key or value fields in mapreduce programs, Lets define a custom data type which can used for both key and value fields. In this post, Lets create a custom data type to process Web Logs from a server and count the occurrences of each IP address. In this sample, lets consider a web log record with five fields – Request No, Site URL, Request Date, Request Time and IP address. A sample record from web log file is as shown below.
We can treat the entities of the above record as built-in Writable data types forming a new custom data type. We can consider the Request No as IntWritable and other four fields as Text data types. Complete input file Web_Log.txt used in this post is attached here
Creating Custom Hadoop Writable Data Type
Lets create a WebLogWritable Data type to serialize and deserialize the above mentioned Web Log record.