In this post we will discuss about setting up of Multiple Agents in a flume flow and pass events from one machine to another machine. We will pass events from one machine to another machine via Avro RPC protocol.
Multi Agent Setup in Flume:
In Multi Agent or Multi-hop setup events travel through multiple agents before reaching the final destination. In Multi agent flows, the sink of the previous agent (ex: Machine1) and source of the current hop (ex: Machine2) need to be avro type with the sink pointing to the hostname or IP address and port of the source machine. So, thus Avro RPC mechanism acts as the bridge between agents in multi hop flow.
In this post we will discuss about simple multi agent setup in flume to collect events from files on Machine1 via spooling directory source, file channel and HDFS sink on Machine2. We will use Avro RPC as bridge between these two machines. From here on wards we call the agent being setup on Machine1 as Agent1 and agent being setup on Machine2 as Agent2.
Agent1 – Spooling Dir Source, File Channel, Avro Sink:
Below are configuration properties that needs to be setup for Agent1 on Machine1 under FLUME_CONF_DIR/flume.conf properties file.
In the above setup, we are sending events in files from /home/user/testflume/spooldir location to port 11111 (we can use any available port) on remote machine (Machine2) with IP address 251.16.12.112 (For security reasons, we have used sample IP address here) through file channel. Now do not start the Agent1 until we setup Agent2 and start it first.
Agent2 – Avro Source, File Channel, HDFS Sink:
So now, events are received from Machine1 spool directory into Machine2 on port 11111. Now we need to collect these events from Machine2 and put them on HDFS available on Machine2. Here are also we are using file channel to make sure no events are lost during transmission even if Agent2 fails in the middle.
Below are the configuration properties for Agent2 on Machine2 which needs to be kept under FLUME_CONF_DIR/flume.conf file.
Start the Agents:
Before Starting agents on two machines,
- Make sure the parent directory given in file channels on two machines are created and users running the agents should have write access to this parent directory on two machines.
- Start HDFS daemons on Machine2.
- Copy the input files into spooling directory.
Now start Agent2 on Machine2 first and then Agent1 on Machine1. Below are the commands that can be used to start the agents.