Flume Agent – Collect Data From Command to a Flat File 1


In this post, we will discuss about flume agent configuration and setup for collecting data from an output of a command line tool into a flat file.

We will use Exec Source type, File Channel and File Roll sink type in configuration of our agent. Lets name our agent as Agent2. We will discuss more about each component and their additional properties at the bottom of this post but we will focus on agent configuration and deployment in the beginning of the post itself.

Flume Agent – Exec Source, File Roll Sink and File Channel:

Lets create an agent Agent2 in flume.conf properties file under Flume_Home/conf directory. We can either use the existing flume.conf file by appending our new agent properties at the bottom of the file or can create new file with our agent only.

Add the below properties in flume.conf file.

and make sure below things before starting agent.

  • Parent directory given in Agent2.sinks.file-sink.sink.directory property should already be created and the flume user has write access to it. Even if the flume user has access to create files and parent directory is not created prior to starting agent, flume process will not create the directory on the fly.
  • Flume user should have write access to Agent2.channels.file-channel.checkpointDir and Agent2.channels.file-channel.dataDir directory locations and these should be created prior to starting the agent or if the flume user has write access to the given path, then flume JVM process will create these folders/files on the fly.
Start Flume Agent:

Now start the flume agent with the below command in terminal

Below is the screen shot of started agent:

Flume Agent2

After some time of running the agent stop the agent by pressing ctrl+c key.

Now open the output directory in another terminal and we can see new files created under the target directory. Below is the screen shot of new files and contents.

Flume Agent2 Output

In the above screen, we can observe the log messages copied from /var/log/syslog file into 1411*-1 file and this file is constantly open for writing by flume agent. This file will be closed only once the agent is stopped by hitting ctrl+c key.

We can also observe the files created under the File channel’s checkpoint directory and data directory locations.

Flume channel op

So we have successfully configured Agent2 with Exec Source, File Roll sink and File channel. Now we can jump into deep insight of each component used in this agent.

Details of Components:

Exec Source:

Exec source runs a given Unix command on shell and captures its output as the input to the Flume agent. This process will be continued to produce events to flume agent continuously. If the process exits for any reason, the source will also exit and will produce no further data. This source is best suitable for command that produce streams of data continuously.

In our above example, we have used below Unix command.

The tail command is used to display contents of a file from the end. Below are examples of its usage. It accepts below arguments.

By default it displays last 10 lines of a file. It accepts below arguments: