Avro Serializing and Deserializing Example – Java API 2


In this post, we will discuss about an example of Avro Serializing and Deserializing with avro data file creation (serializing data) and deserializing the same avro data file to read the contents back. This is continuation for our previous post on Avro Schema , in which we have defined schema for Employee record and compiled the schema with the help of avro-tools-1.7.4.jar file which generated the Java code for schema. In this post, we will discuss below topics.

  • Serializing and Deserializing with Code generation
  • Serializing and Deserializing without Code generation

In this section we will mainly focus on Java API for serializing and deserializing with code generation and without code generation.

With Code generation:
Serializing:

Lets create some employee records in avro data file with the help of Employee_Record.java file created in example.avro package. Lets copy below lines of code into GenerateDataWithCode.java program in example package. In Eclipse, we will copy these programs into their packages example.avro and example respectively.

In the above code we are creating employee records in three ways (Calling setter methods, Constructor & via Builder class). And we are serializing these employee object records into avro data file with the help of SpecificDatumWriter & DataFileWriter classes of avro library. Below are a few details of these classes.

  • SpecificDatumWriter Java I-O Class to write data of a schema. It implements the base interface DatumWriter.  DatumWriter converts Java objects into an in-memory serialized format.
  • DataFileWriter – Stores a sequence of data conforming to a schema in a file. The schema is stored in the file with the data. Each datum in a file is of the same schema. Data is written with a DatumWriter. Data is grouped into blocks. A synchronization marker is written between blocks, so that files can be split. Blocks can be compressed. Extensible metadata is stored at the end of the file. Files may be appended to.

After compiling the above program by keeping it in correct package hierarchy, then we can run the program in eclipse itself. Now we can see the employees.avro file got created in the eclipse project folder.

Generate Data

Below is the snapshot of project folder after running the above program in eclipse.

Generate Data 2

So, now the avro data file is successfully created.

Deserializing:

Now lets, read the avro data file with help of below program which uses Employee_Record class to read the employee objects and prints the objects on console. Lets copy the below lines of code into DeserializeWithCode.java program.

In the above code, we are deserializing the employees.avro data file with help of below two classes from avro library.

  • DataFileReader – Provides random access to files written with DataFileWriter.
  • SpecificDatumReader – Reads data of a schema. It implements DatumReader interface.

Compile the above program and run it in eclipse. Below is the snapshot of console messages from eclipse after running the program.

Read avro data file

Without Code generation:

As avro data files contain schema along with the actual data blocks, we can always read a serialized item regardless of whether we know the schema ahead of time or not. In this section we’ll create some employee records, serialize them to a data file on disk, and then read back the file and deserialize the employee objects.

Serializing:

As we are not using Employee_Record class to create the objects, so we will use Generic datum readers and Generic data File readers instead of specific data file/datum readers.

After compiling this program and running it from eclipse, below is the snapshot of project folder.

Generate Data 3

Deserializing:

We can deserialize the above avro data file with the help of below java program.

  • GenericDatumReader – converts in-memory serialized items into GenericRecords.
  • DataFileReader – Reads the avro data file on disk.

Deserialize data 2

So, we have successfully tested serialization and deserialization without code generation.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Avro Serializing and Deserializing Example – Java API

  • Jijo

    Hi Siva,

    This article helped me in reading the contents of a cluster located avro file.

    When I am trying this example to write to a cluster location I am getting the output file in my local.  Instead I wanted it to write to cluster location . Do you have any work around for this scenario.

    Thanks in Advance.

    Jijo

  • Andy

    Nice tutorial mate and thanks for info… well i found this link http://lets-do-something-big.blogspot.in/2015/12/apache-avro.html  also very useful for beginners. apologies if it is spam for you


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.