Avro MapReduce 2 API Example


Avro provides support for both old Mapreduce Package API (org.apache.hadoop.mapred) and new Mapreduce Package API (org.apache.hadoop.mapreduce). Avro data can be used as both input and output from a MapReduce job, as well as the intermediate format.

In this post we will provide an example run of Avro Mapreduce 2 API. This post can be treated as continuation for the previous post on Avro Mapreduce API. In this post, we will create some sample schema and generate avro data file using ruby and run Mapreduce program to count the colors in the avro data file.

Create a Schema:

For testing of avro data files via mapreduce we will create sample schema as shown below. Copy the below schema into sample.avsc file.

Generate Avro Data:

Generate some sample avro data records into samplecolors.avro file, confining to above schema by using below ruby code.

run the above ruby program from command terminal and generate the samplecolors.avro file containing avro records as shown below in the screen shot.

generate data

MapReduce Color Count Example:

Below is a sample mapreduce program to count the colors from the above avro data file. Copy the below code snippet into MapReduceColorCount.java program.