Flume Avro Client – Collecting a Remote File into Local File 1


In this post, we will discuss about setup of a Flume Agent using Avro Client, Avro Source, JDBC Channel, and File Roll sink.

First we will create Agent3 in flume.conf file under FLUME_HOME/conf directory.

Flume Agent – Avro Source, JDBC Channel and File Roll Sink:

  • Add the below configuration properties in flume.conf file to create Agent3.

  •  Make sure /usr/lib/flume/agent/files/ directory is created and Flume use has write permissions to this location.
  • Create a sample file for giving it as input to Avro Client. Lets create AvroClientInput.txt in the home directory itself.
  • Now start the agent with below command.

 Below is the screen shot of starting agent from terminal.

Flume Agent3

  • Now open the host connection through Avro client to send the test file to specified port with the below command in another terminal.

An Avro client included in the Flume distribution can send a given file to Flume Avro source with the above command. The above command will send the contents of /home/siva/AvroClientInput.txt file to the Flume source listening on localhost:11111 port.

Below is the screen shot of another terminal. We have created test input file first and then avro-client is used to send the file at localhost:11111 port.

Flume Avro Client

  • Validate the output in the destination directory /usr/lib/flume/agent/files/ .

Flume Avro Client Op

So, we have successfully configured the agent with Avro Source and JDBC channel into File Roll sink.

Below are the indepth details about Avro source and JDBC Channels.

Avro Source:

Avro is a data serialization framework and it manages the packaging and transport of data from one point to another point across the network. An Avro Source Listens on Avro port and receives events from external Avro client streams. When connected with the built-in Avro Sink on another Flume agent, it can create chain-agents topology.

By default, Flume  distribution supports both an Avro source and a standalone Avro client. Avro Client reads a file and sends it to an Avro source anywhere on the network and need not to be on same local machine as we used in our example. But if it is outside the local machine Avro client requires the explicit hostname and port of the Avro source to which it should send the file.

Finally, Avro collects the events from file sent by avro client and passes them to file sink.

Below are the properties related to avro source. Required properties are in bold.

Property Name Default Description
type The component type name, needs to be avro
bind hostname or IP address to listen on
port Port # to bind to
threads Maximum number of worker threads to spawn
compression-type none This can be “none” or “deflate”. The compression-type must match the compression-type of matching AvroSource
ssl FALSE Set this to true to enable SSL encryption
JDBC Channel:

Similar to File channel, JDBC channel also provide persistent storage of events to prevent event loss in case of agent failure. By default JDBC channel uses an embedded Derby database to store events. This is a durable channel that’s ideal for flows where recoverability is important.

In below properties table, required properties are in bold.

Property Name Default Description
type The component type name, needs to be jdbc
db.type DERBY Database vendor, needs to be DERBY.
driver.class org.apache.derby.jdbc.EmbeddedDriver Class for vendor’s JDBC driver
driver.url (constructed from other properties) JDBC connection URL
db.username “sa” User id for db connection
db.password password for db connection
connection.properties.file JDBC Connection property file path
create.schema TRUE If true, then creates db schema if not there
create.index TRUE Create indexes to speed up lookups
create.foreignkey TRUE  
transaction.isolation “READ_COMMITTED” Isolation level for db session READ_UNCOMMITTED, READ_COMMITTED, SERIALIZABLE, REPEATABLE_READ
maximum.connections   Max connections allowed to db
maximum.capacity   Max number of events in the channel
sysprop.*   DB Vendor specific properties
sysprop.user.home   Home path to store embedded Derby database

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Flume Avro Client – Collecting a Remote File into Local File


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.