Predefined Mapper and Reducer Classes


Hadoop provided some predefined Mapper and Reducer classes in its Java API and these will be helpful in writing simple or default mapreduce jobs. A few among the entire list of predefined mapper and reducer classes are provided below.

Identity Mapper

Identity Mapper is the default Mapper class provided by hadoop and this will be picked automatically when no mapper is specified in Mapreduce driver class. Identity Mapper class implements the identity function, which directly writes all its input key/value pairs into output. It is a generic mapper class and it can be used with any key/value data types.

It’s class IdentityMapper is defined in old mapreduce API in org.apache.hadoop.mapred.lib package. In its usage, map input and output keys data types should be of same type, and the map input and output values data types must be same.

Identity Reducer

It is the default reducer class provided by Hadoop and this class will be picked up by mapreduce job automatically when no other reducer class is specified in the driver class. Similar to Identity Mapper, this class also doesn’t perform any processing on the data and it simply writes all its input data into output.

It is also a generic reducer class defined in old Mapreduce API in org.apache.hadoop.mapred.lib package. Its class name is IdentityReducer. 

And below are a few more list from new mapreduce API. 

Inverse Mapper

This is a generic mapper class which simply reverses (or swaps) its input (key , value ) pairs into (value, key) pairs in output.

This InverseMapper class is defined in org.apache.hadoop.mapreduce.lib.map package.

Token Counter Mapper

This mapper class, tokenizes its input data (splits data into words) and writes each word with a count of 1 in (word, 1) key-value format. This class takes the input in the format of

Mapper<Object,Text,Text,IntWritable>

I.e. Map input key can be of any data type and Input value data type and map output key data type should be of Text and the map output value data type must be IntWritable. So, it is not a generic mapper class. TokenCounterMapper class is present at org.apache.hadoop.mapreduce.lib.map package.

Regex Mapper

This mapper class extracts text matching with the given regular expression. This RegexMapper class belongs to org.apache.hadoop.mapreduce.lib.map package.

Chain Mapper

Chain Mapper class can be used to run multiple mappers in a single map task. All mapper classes are run in chained pattern that, the output of the first mapper becomes the input of the second mapper, and so on until the last Mapper, the output of the last Mapper will be written to the task’s output.

No need to specify the output key/value classes for the ChainMapper, this is done by the addMapper() method for the last mapper in the chain.

Its class ChainMapper is defined in org.apache.hadoop.mapreduce.lib package.

Chain Mapper usage pattern

List of reducers available in mapreduce API.

IntSum Reducer

This reducer class outputs the sum of integer values associated with each reducer input key. This IntSumReducer class is present in org.apache.hadoop.mapreduce.lib.reduce package.

LongSum Reducer

This reducer class outputs the sum of long values per reducer input key.  LongSumReducer class at org.apache.hadoop.mapreduce.lib.reduce package.

Chain Reducer

Chain Reducer class permits to run a chain of mapper classes after a reducer class within reduce task. The output of the reducer becomes the input of the first mapper and output of the first mapper becomes the input of the second mapper, and so on until the last Mapper, the output of the last Mapper will be written to the task’s output.

Its class ChainReducer is defined in org.apache.hadoop.mapreduce.lib package.

ChainReducer usage pattern

Usage of Predefined Mapper & Reducers in Word Count Example

By using the above mentioned predefined mapper and reducer classes in our Word Count Mapreduce example program, we can rewrite the same program easily in single driver class as shown below.

Compile the program and build a jar file to execute the mapreduce job with the commands as shown below.

Below are the screen shots of execution of above commands in terminal.

WordCountWithPredefinedClasses

WordCountWithPredefinedClasses2

Validate the output results in the output directory.

WordCount2 Output

Here, The TokenCounterMapper class simply splits each input line into a series of (token,1) pairs and the IntSumReducer class provides a final count by summing the number of values for each key.

Thus, predefined mapper and reducer classes can be reused in writing some of the Mapreduce programs.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.