Run Example MapReduce Program 2


For testing YARN/Map Reduce Installation, we can run example mapreduce program (word count job) from the hadoop download directory. Hadoop release will contain map reduce examples in share/hadoop/mapreduce/hadoop-mapreduce-examples-x.y.z.jar file.

In this demonstration, we will consider wordcount mapreduce program from the above jar to test the counts of each word in a input file and writes counts into output file.

1. Create input test file in local file system and copy it to HDFS.

word count input

2. Run mapreduce program /job with below command.

Here the

  • Third argument is jar file which contains class file (wordcount.class) for wordcount program.
  • Fourth argument is name of the public class which is driver for map reduce job.
  • Fifth argument is path for input data set
  • Last argument is directory path under which output files will be created. This output directory should not be present before running the map reduce job. Otherwise file already exists exception “org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/test/output already exists ” will be thrown.

Map Reduce run

Map Reduce Run2

3. If the job completes successfully and returning messages similar to above screens, verify the output of the map reduce job.

Output directory will contains

  • A _SUCCESS file which is just a flag file to denote whether the map reduce job was run successfully or not. It is of zero length file and doesn’t contain contents in it.
  • One part-r-xxxxx file for each reducer. So, the number of part output files will be equal to the number of reducers run as part of the job. Actual output content is written into these part files.

Map reduce output

Job execution and outputs can also be verified through web interface. Please refer this post to know how to browse through HDFS & Job tracker using Web User Interface.

Note:

If output directory already exists, map reduce job will fail with org.apache.hadoop.mapred.FileAlreadyExistsException. As shown in below screen.

MR File exist exception

In this case, delete the output file and re-execute the job.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Run Example MapReduce Program


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.