Hadoop provided some predefined Mapper and Reducer classes in its Java API and these will be helpful in writing simple or default mapreduce jobs. A few among the entire list of predefined mapper and reducer classes are provided below.
Identity Mapper is the default Mapper class provided by hadoop and this will be picked automatically when no mapper is specified in Mapreduce driver class. Identity Mapper class implements the identity function, which directly writes all its input key/value pairs into output. It is a generic mapper class and it can be used with any key/value data types.
It’s class IdentityMapper is defined in old mapreduce API in org.apache.hadoop.mapred.lib package. In its usage, map input and output keys data types should be of same type, and the map input and output values data types must be same.
It is the default reducer class provided by Hadoop and this class will be picked up by mapreduce job automatically when no other reducer class is specified in the driver class. Similar to Identity Mapper, this class also doesn’t perform any processing on the data and it simply writes all its input data into output.
It is also a generic reducer class defined in old Mapreduce API in org.apache.hadoop.mapred.lib package. Its class name is IdentityReducer.
And below are a few more list from new mapreduce API.
This is a generic mapper class which simply reverses (or swaps) its input (key , value ) pairs into (value, key) pairs in output.
This InverseMapper class is defined in org.apache.hadoop.mapreduce.lib.map package.
Token Counter Mapper
This mapper class, tokenizes its input data (splits data into words) and writes each word with a count of 1 in (word, 1) key-value format. This class takes the input in the format of
I.e. Map input key can be of any data type and Input value data type and map output key data type should be of Text and the map output value data type must be IntWritable. So, it is not a generic mapper class. TokenCounterMapper class is present at org.apache.hadoop.mapreduce.lib.map package.
This mapper class extracts text matching with the given regular expression. This RegexMapper class belongs to org.apache.hadoop.mapreduce.lib.map package.
Chain Mapper class can be used to run multiple mappers in a single map task. All mapper classes are run in chained pattern that, the output of the first mapper becomes the input of the second mapper, and so on until the last Mapper, the output of the last Mapper will be written to the task’s output.
No need to specify the output key/value classes for the ChainMapper, this is done by the addMapper() method for the last mapper in the chain.
Its class ChainMapper is defined in org.apache.hadoop.mapreduce.lib package.
Chain Mapper usage pattern
List of reducers available in mapreduce API.
This reducer class outputs the sum of integer values associated with each reducer input key. This IntSumReducer class is present in org.apache.hadoop.mapreduce.lib.reduce package.
This reducer class outputs the sum of long values per reducer input key. LongSumReducer class at org.apache.hadoop.mapreduce.lib.reduce package.
Chain Reducer class permits to run a chain of mapper classes after a reducer class within reduce task. The output of the reducer becomes the input of the first mapper and output of the first mapper becomes the input of the second mapper, and so on until the last Mapper, the output of the last Mapper will be written to the task’s output.
Its class ChainReducer is defined in org.apache.hadoop.mapreduce.lib package.
ChainReducer usage pattern
Usage of Predefined Mapper & Reducers in Word Count Example
By using the above mentioned predefined mapper and reducer classes in our Word Count Mapreduce example program, we can rewrite the same program easily in single driver class as shown below.