In this post we will discuss about the below aspects of avro schema.
- Avro Data Types
- Defining a schema
- Compiling the Schema and Code generation
Avro Schemas are defined in JSON. Schemas are composed of primitive data types or complex data types.
Avro’s primitive types are listed below.
|“boolean”||a binary value|
|“int”||32-bit signed integer|
|“long”||64-bit signed integer|
|“float”||Single precision 32 bit floating-point number|
|“double”||Double precision 64 bit floating-point number|
|“bytes”||sequence of 8-bit unsigned bytes|
|“string”||unicode character sequence|
Primitive type names are also defined type names. Thus, for example, the schema “string” is equivalent to:
Avro supports six kinds of complex types: records, enums, arrays, maps, unions and fixed.
Records: Collection of named fields of any type. Type name must be “record” and below are important attributes of it.
- name: Name of the record (required).
- doc: Documentation to this schema (optional)
- aliases: Alternate names for this record (optional).
fields: a JSON array, listing fields (required). Each field is a JSON object with the following attributes:
- name: Name of the field (required)
- type: A JSON string naming a record definition (required).
- default: A default value for this field (optional)
- order: sort ordering of this record (optional). Valid values are “ascending” (default), “descending”, or “ignore”.
Enums: A set of named values. Type must be “enum”. below are important attributes of it.
- name: Name of the enum (required).
- symbols: a JSON array, listing symbols, as JSON strings (required). All symbols in an enum must be unique; duplicates are prohibited.
- Arrays: Ordered collection of objects. All objects in a particular array must have the same schema. Type name must be “array” and it supports a single attribute “items”. Example:
- Maps: Un ordered collection of key-value pairs. Keys must be strings, values may be any type. It supports single attribute “values”. Example: a map from string to long.
- Unions: A union is represented by a JSON array, where each element in the array is a schema. For example, [“null”, “string”] declares a schema which may be either a null or string.
- Fixed: A fixed number of 8-bit unsigned bytes. Type name must be “fixed” and it supports two attributes: “name” and “size”. Example:
Defining a schema:
With the help of above primitive types and complex types let us create a schema for employees records with four fields – joining date, role, dept and salary.
Create the following Avro Schema example as employee.avsc :
Compiling Schema & Code Generation:
Once we have defined the schema, we can generate the code for the schema by compiling the schema. If we have code for schema, then there is no need to use the schema directly in our programs.
We can generate the code using the avro-tools jar as follows:
In the below command, note that “.” is used to denote the current working directory as destination to generate the code. Now this will create Employee_Record.java file under the package specified in namespace (example.avro) attribute of schema.
Below is the code generated out of above schema compilation.