Avro Serialization & Deserialization – Python API 1


In the previous posts under Avro category we have examined Java API & Ruby API for avro serialization and deserialization. As part of Avro’s Interoperability testing, In this post we will examine Python API by creating a sample avro data file and reading the contents back.

Prerequisite:

Before running avro serialization examples using python, we need to build avro python library. We can build this library by downloading the avro source files at Apache mirrors.

  • Download the avro-src-x.y.z.tar.gz from the above download mirrors into our preferred location usually into /usr/lib/avro and extract the gzipped file.
  • Change directory into lang/py sub directory under main source folder and perform below activities from terminal to build avro python library.

Below are the screen shots of avro python library build via terminal.

Avro Python Install

Avro Python Install 2

  • Verify the build of avro python library by “import avro” command on python shell and if it does not throw any import errors then our installation is successful.

Python Avro import As we didn’t receive any import errors in the above screen, now we are ready to create avro data files via python. Press ctrl+d to exit from python shell mode.

Avro Serialization & Deserialization

Similar to Ruby, Avro Python library does not support code generation for schema. So, we need to parse the schema at the time of writing avro data file itself. To test creation of avro data file and reading the contents back, we will use the below schema of record type with two fields. Save this into pair.avsc file.

In the below code snippet, we are creating a sample avro data file with the help of above pair.avsc schema and reading the contents back onto console. Save below code snippet into AvroWriteRead.py file.

Execute the above python script with below command.

Python Avro run

So, we have successfully built avro python library and tested a sample avro example in python to create avro data file and read it back onto console.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Avro Serialization & Deserialization – Python API


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017