In the previous posts under Avro category we have examined Java API & Ruby API for avro serialization and deserialization. As part of Avro’s Interoperability testing, In this post we will examine Python API by creating a sample avro data file and reading the contents back.
Before running avro serialization examples using python, we need to build avro python library. We can build this library by downloading the avro source files at Apache mirrors.
- Download the avro-src-x.y.z.tar.gz from the above download mirrors into our preferred location usually into /usr/lib/avro and extract the gzipped file.
- Change directory into lang/py sub directory under main source folder and perform below activities from terminal to build avro python library.
Below are the screen shots of avro python library build via terminal.
- Verify the build of avro python library by “import avro” command on python shell and if it does not throw any import errors then our installation is successful.
As we didn’t receive any import errors in the above screen, now we are ready to create avro data files via python. Press ctrl+d to exit from python shell mode.
Avro Serialization & Deserialization
Similar to Ruby, Avro Python library does not support code generation for schema. So, we need to parse the schema at the time of writing avro data file itself. To test creation of avro data file and reading the contents back, we will use the below schema of record type with two fields. Save this into pair.avsc file.
In the below code snippet, we are creating a sample avro data file with the help of above pair.avsc schema and reading the contents back onto console. Save below code snippet into AvroWriteRead.py file.
Execute the above python script with below command.
So, we have successfully built avro python library and tested a sample avro example in python to create avro data file and read it back onto console.