Merging Small Files Into Avro File

This post is a continuation for previous post on working with small files issue. In previous we have merged huge number of small files on HDFS directory into sequencefile and in this post we will merge huge number of small files on local file system into avro file on HDFS output directory. We will store the file names as keys and file contents as values in Avro file. We will […]

Avro MapReduce 2 API Example

Avro provides support for both old Mapreduce Package API (org.apache.hadoop.mapred) and new Mapreduce Package API (org.apache.hadoop.mapreduce). Avro data can be used as both input and output from a MapReduce job, as well as the intermediate format. In this post we will provide an example run of Avro Mapreduce 2 API. This post can be treated as continuation for the previous post on Avro Mapreduce API. In this post, we will create […]

Avro MapReduce Word Count Example 1

In this post, we will discuss about famous word count example through mapreduce and create a sample avro data file in hadoop distributed file system. Prerequisite: In order to execute the mapreduce word count program given in this post, we need avro-mapred-1.7.4-hadoop2.jar file to be present in $HADOOP_HOME/share/hadoop/common/lib directory. This jar contains the classes used for avro serialization and deserialization through mapreduce framework. For instructions on installation and integration of […]

Avro Serialization & Deserialization – Python API

In the previous posts under Avro category we have examined Java API & Ruby API for avro serialization and deserialization. As part of Avro’s Interoperability testing, In this post we will examine Python API by creating a sample avro data file and reading the contents back. Prerequisite: Before running avro serialization examples using python, we need to build avro python library. We can build this library by downloading the avro source […]

Avro Serialization Ruby Example 1

In the previous posts under Avro category we have discussed about Java API for Avro serialization ruby example. As there is no need for the code generation for schema evolution, we can use any other language for interacting with avro serialization and deserialization. In this post, we will provide a basic introduction for Avro serialization and deserialization via Ruby API. And it is pretty easier than Java API to create […]

Avro Serializing and Deserializing Example – Java API 2

In this post, we will discuss about an example of Avro Serializing and Deserializing with avro data file creation (serializing data) and deserializing the same avro data file to read the contents back. This is continuation for our previous post on Avro Schema , in which we have defined schema for Employee record and compiled the schema with the help of avro-tools-1.7.4.jar file which generated the Java code for schema. In […]

Avro Schema Example Definition 1

In this post we will discuss about the below aspects of avro schema. Avro Data Types Defining a schema Compiling the Schema and Code generation Avro Schemas are defined in JSON. Schemas are composed of primitive data types or complex data types. Primitive Types: Avro’s primitive types are listed below. Type Description “null” no value “boolean” a binary value “int” 32-bit signed integer “long” 64-bit signed integer “float” Single precision […]

Hadoop Integration – Avro Errors 1

In this post we will discuss about some of the errors or exceptions that can occur when there is mismatch in the integration of Avro and Hadoop distributions. When we do not use the correct version of avro release then we will run into so many errors or exceptions. In this post, we will consider the version compatibility for Hadoop-2.3.0 release. As it is Hadoop2  we need to use the […]

Avro Serialization 1

In this post, we will discuss about basic introduction about Avro serialization. What is Avro Serialization? : Avro is a one of the famous data serialization and deserialization frameworks that greatly integrates with almost all hadoop platforms. Avro framework is created by Doug Cutting, the creator of Hadoop and now it is full fledged project under Apache Software foundation. Need for Avro Serialization: Hadoop‘s native library provides Writables for data serialization (converting […]

Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016