Multi Agent Setup in Flume 2


In this post we will discuss about setting up of Multiple Agents in a flume flow and pass events from one machine to another machine. We will pass events from one machine to another machine via Avro RPC protocol.

Multi Agent Setup in Flume:

In Multi Agent or Multi-hop setup events travel through multiple agents before reaching the final destination. In Multi agent flows, the sink of the previous agent (ex: Machine1) and source of the current hop (ex: Machine2) need to be avro type with the sink pointing to the hostname or IP address and port of the source machine. So, thus Avro RPC mechanism acts as the bridge between agents in multi hop flow.

multi agent

In this post we will discuss about simple multi agent setup in flume to collect events from files on Machine1 via spooling directory source, file channel and HDFS sink on Machine2. We will use Avro RPC as bridge between these two machines. From here on wards we call the agent being setup on Machine1 as Agent1 and agent being setup on Machine2 as Agent2.

Agent1 – Spooling Dir Source, File Channel, Avro Sink:

Below are configuration properties that needs to be setup for Agent1 on Machine1 under FLUME_CONF_DIR/flume.conf properties file.

In the above setup, we are sending events in files from /home/user/testflume/spooldir  location to port 11111 (we can use any available port) on remote machine (Machine2) with IP address 251.16.12.112 (For security reasons, we have used sample IP address here) through file channel. Now do not start the Agent1 until we setup Agent2 and start it first.

Agent2 – Avro Source, File Channel, HDFS Sink:

So now, events are received from Machine1 spool directory into Machine2 on port 11111. Now we need to collect these events from Machine2 and put them on HDFS available on Machine2. Here are also we are using file channel to make sure no events are lost during transmission even if Agent2 fails in the middle.

Below are the configuration properties for Agent2 on Machine2 which needs to be kept under FLUME_CONF_DIR/flume.conf file.

Start the Agents:

Before Starting agents on two machines,

  • Make sure the parent directory given in file channels on two machines are created and users running the agents should have write access to this parent directory on two machines.
  • Start HDFS daemons on Machine2.
  • Copy the input files into spooling directory.

Now start Agent2 on Machine2 first and then Agent1 on Machine1. Below are the commands that can be used to start the agents.

Agent2:

Agent1:

Below are the screen shots of starting agents Agent2 and Agent1 on two machines.

Agent2

Below is the screen shot of Input spool directory.

Spool directory input

After successful start of agents, we input files will be converted into .COMPLETED files in the input spool directory and files are stored in HDFS directory as shown in below screen shot.

HDFS out

For setup of multiple agents with an example, we have showed the above agents to transfer events from a spool directory source on one machine into HDFS sink on another machine. If it is just for transferring the events from one machine onto another machine of HDFS sink is our aim, then we can achieve this just by setting up a single agent on source machine with below configuration properties. (Considered agent as Agent3).

This is the advantage of HDFS sink.

So we have successfully setup multi agent hop with AVRO RPC as bridging mechanism and sent data events across machines.


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Multi Agent Setup in Flume

  • Amit dass

    I can see IP of both Machine1 and Machine2 is same ? I had a requirement to pick data from external server(10.10.10.2) to hdfs cluster( 10.10.10.1) . External Server had user and password . Can you please help me here

     

  • Anonymous

    Can we configure a fault tolerant “flume” by input a message through multiple agents??

    if one agent is lost, the same message can be sent via agent 2 and likewise.??

    need your help on this…


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.