Avro Schema Example Definition 1


In this post we will discuss about the below aspects of avro schema.

  • Avro Data Types
  • Defining a schema
  • Compiling the Schema and Code generation

Avro Schemas are defined in JSON. Schemas are composed of primitive data types or complex data types.

Primitive Types:

Avro’s primitive types are listed below.

Type Description
“null” no value
“boolean” a binary value
“int” 32-bit signed integer
“long” 64-bit signed integer
“float” Single precision 32 bit floating-point number
“double” Double precision 64 bit floating-point number
“bytes” sequence of 8-bit unsigned bytes
“string” unicode character sequence

Primitive type names are also defined type names. Thus, for example, the schema “string” is equivalent to:

Complex Types:

Avro supports six kinds of complex types: records, enums, arrays, maps, unions and fixed.

  • Records:  Collection of named fields of any type. Type name must be “record” and below are important attributes of it.
    • name: Name of the record (required).
    • doc: Documentation to this schema (optional)
    • aliases: Alternate names for this record (optional).
    • fields: a JSON array, listing fields (required). Each field is a JSON object with the following attributes:
      • name: Name of the field (required)
      • type: A JSON string naming a record definition (required).
      • default: A default value for this field (optional)
      • order: sort ordering of this record (optional). Valid values are “ascending” (default), “descending”, or “ignore”.
  • Enums:  A set of named values. Type must be “enum”.  below are important attributes of it.
    • name: Name of the enum (required).
    • symbols: a JSON array, listing symbols, as JSON strings (required). All symbols in an enum must be unique; duplicates are prohibited.

Example: 

  • Arrays:  Ordered collection of objects. All objects in a particular array must have the same schema. Type name must be “array” and it supports a single attribute “items”. Example:

  • Maps:  Un ordered collection of key-value pairs. Keys must be strings, values may be any type. It supports single attribute values”. Example: a map from string to long.

  • Unions: A union is represented by a JSON array, where each element in the array is a schema. For example, [“null”, “string”] declares a schema which may be either a null or string.
  • Fixed:  A fixed number of 8-bit unsigned bytes. Type name must be “fixed” and it supports two attributes: “name” and “size”. Example:

Defining a schema:

With the help of above primitive types and complex types let us create a schema for employees records with four fields – joining date, role, dept and salary.

Create the following Avro Schema example as employee.avsc :

Compiling Schema & Code Generation:

Once we have defined the schema, we can generate the code for the schema by compiling the schema. If we have code for schema, then there is no need to use the schema directly in our programs.

We can generate the code using the avro-tools jar as follows:

In the below command, note that “.” is used to denote the current working directory as destination to generate the code. Now this will create Employee_Record.java file under the package specified in namespace (example.avro) attribute of schema.

avro schema compilation

Below is the code generated out of above schema compilation.

From the above auto generated code, we can observe below things clearly.

  • All the fields in shcema record are defined as fields in java code.
  • Schema is parsed and Objects can be created either using constructor or via builder().  Using Constructor does not initialize fields to their default values from the schema, if this is desired, we need to use newBuilder() method.
  • Each field will have separate getter and setter methods defined in the program.

In the next post we will use this generated code to create avro data files and deserialize them back to verify the data.

Note: A schema file can only contain a single schema definition.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.