Monthly Archives: September 2014

Avro Serialization & Deserialization – Python API 1

In the previous posts under Avro category we have examined Java API & Ruby API for avro serialization and deserialization. As part of Avro’s Interoperability testing, In this post we will examine Python API by creating a sample avro data file and reading the contents back. Prerequisite: Before running avro serialization examples using python, we need to build avro python library. We can build this library by downloading the avro source […]

Avro Serialization Ruby Example 1

In the previous posts under Avro category we have discussed about Java API for Avro serialization ruby example. As there is no need for the code generation for schema evolution, we can use any other language for interacting with avro serialization and deserialization. In this post, we will provide a basic introduction for Avro serialization and deserialization via Ruby API. And it is pretty easier than Java API to create […]

Avro Serializing and Deserializing Example – Java API 6

In this post, we will discuss about an example of Avro Serializing and Deserializing with avro data file creation (serializing data) and deserializing the same avro data file to read the contents back. This is continuation for our previous post on Avro Schema , in which we have defined schema for Employee record and compiled the schema with the help of avro-tools-1.7.4.jar file which generated the Java code for schema. In […]

Avro Schema Example Definition 1

In this post we will discuss about the below aspects of avro schema. Avro Data Types Defining a schema Compiling the Schema and Code generation Avro Schemas are defined in JSON. Schemas are composed of primitive data types or complex data types. Primitive Types: Avro’s primitive types are listed below. Type Description “null” no value “boolean” a binary value “int” 32-bit signed integer “long” 64-bit signed integer “float” Single precision […]

NoClassDefFoundError or ClassNotFoundException

In this post we will discuss about most frequent error messages NoClassDefFoundError or ClassNotFoundException in hadoop mapreduce jobs execution and possible solutions for them. Error Scenario: java.lang.ClassNotFoundException       or java.lang.NoClassDefFoundError Error starting MRAppMaster   &  Container exited with a non-zero exit code 1 When we encounter a situation of NoClassDefFoundError or ClassNotFoundException even though we have required jar files added to our build path in eclipse, then we have […]

Hadoop Integration – Avro Errors 1

In this post we will discuss about some of the errors or exceptions that can occur when there is mismatch in the integration of Avro and Hadoop distributions. When we do not use the correct version of avro release then we will run into so many errors or exceptions. In this post, we will consider the version compatibility for Hadoop-2.3.0 release. As it is Hadoop2  we need to use the […]

Avro Serialization 1

In this post, we will discuss about basic introduction about Avro serialization. What is Avro Serialization? : Avro is a one of the famous data serialization and deserialization frameworks that greatly integrates with almost all hadoop platforms. Avro framework is created by Doug Cutting, the creator of Hadoop and now it is full fledged project under Apache Software foundation. Need for Avro Serialization: Hadoop‘s native library provides Writables for data serialization (converting […]

Expected timestamp in the Flume event headers, but it was null 1

Error Scenario: Expected timestamp in the Flume event headers, but it was null – NullPointerException This error message is received in ~/logs/flume.log file when starting a flume agent with HDFS sink with format escape sequences.(%Y, %M, %D, %H, %M, %S).

If a sink expects a header but does not find it, events will become stuck in the channel and Flume will log NullPointer and EventDelivery exceptions. Root Cause: We receive […]

channel has been removed due to an error

Error Scenario: channel has been removed due to an error during configuration. This error message is received in ~/logs/flume.log file when starting a flume agent with missing or wrong configuration with JDBC channel setup.

Root Cause: DerbySchemaHandler.schemaExists error message is received when a flume agent’s sink is not able to receive the events from this JDBC channel and for some reason agent process stopped and when we try to […]

Mapreduce job stuck at map 0% reduce 0% 1

Error Scenario: Mapreduce job stuck at map 0% reduce 0%. : Could not resolve hostname

Mapreduce jobs are not running. And the below error message is coming when starting HDFS daemons.

Resolution: These above error messages are received when there is some issue with ssh communication. So, in these situations, we need to remove openssh-client and openssh-server and re-install them again.

Once the ssh is re-installed, we will […]

Pig Installation on Ubuntu 1

In this post, we will describe the procedure for Pig Installation on Ubuntu Machine. Prerequisite: Below are the basic requirement for Pig installation on Ubuntu and getting started. Java 1.6 or Later versions installed and JAVA_HOME environment variable set to Java installation directory Hadoop1.x or 2.x Installed on the cluster. In this post we will use Hadoop-2.3.0 version for HADOOP_HOME environment variable setup. Pig Installation Procedure: Download the latest stable […]

Short Notes on Java Collections Framework 1

Java Collections Framework Notes: Collection Interface: The fundamental interface for collection classes in the Java library is the Collection interface.

The add method returns true if adding the element actually changes the collection, and false if the collection is unchanged. The iterator method returns an object that implements the Iterator interface. You can use the iterator object to visit the elements in the collection one by one. Iterator Interface: […]