For testing YARN/Map Reduce Installation, we can run example mapreduce program (word count job) from the hadoop download directory. Hadoop release will contain map reduce examples in share/hadoop/mapreduce/hadoop-mapreduce-examples-x.y.z.jar file.
In this demonstration, we will consider wordcount mapreduce program from the above jar to test the counts of each word in a input file and writes counts into output file.
1. Create input test file in local file system and copy it to HDFS.
2. Run mapreduce program /job with below command.
- Third argument is jar file which contains class file (wordcount.class) for wordcount program.
- Fourth argument is name of the public class which is driver for map reduce job.
- Fifth argument is path for input data set
- Last argument is directory path under which output files will be created. This output directory should not be present before running the map reduce job. Otherwise file already exists exception “org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/test/output already exists ” will be thrown.
3. If the job completes successfully and returning messages similar to above screens, verify the output of the map reduce job.
Output directory will contains
- A _SUCCESS file which is just a flag file to denote whether the map reduce job was run successfully or not. It is of zero length file and doesn’t contain contents in it.
- One part-r-xxxxx file for each reducer. So, the number of part output files will be equal to the number of reducers run as part of the job. Actual output content is written into these part files.
Job execution and outputs can also be verified through web interface. Please refer this post to know how to browse through HDFS & Job tracker using Web User Interface.
If output directory already exists, map reduce job will fail with org.apache.hadoop.mapred.FileAlreadyExistsException. As shown in below screen.
In this case, delete the output file and re-execute the job.