Flume Agent Setup – Netcat Source, Console Sink 2

In this post, we will discuss about setting up of simple flume agent using Netcat as source and Console as sink.

In this example of single-node Flume deployment, we create a Netcat source which listens on a port (localhost:44444) for network connections and logger sink type to log network traffic to console.

For sending network traffic, we can either use curl utility or traditional tool telnet. We prefer using curl in this post.

Flume Agent Setup:

Step 1 – Create flume.conf file:

We have to setup the configuration properties for flume agent in a file. This configuration file needs to define the sources, the channels and the sinks grouped under the agent.

Let’s create flume.conf properties file under FLUME_CONF_DIR (FLUME_HOME/conf) location or any other preferred location and provide below properties in the file.

This configuration file defines an agent called Agent1. We can give any names to agents, sources, sinks and channels. So, in this example, we have taken descriptive names, Agent1, netcat-source, memory-channel and logger-sink.

We will discuss about each property of the configuration file in detail in next post until then let’s not bother bother about the details of these properties.

NOTE: We can define multiple agents in the same configuration file and can launch specified agent using a flag –name.

Step 2 – Starting Flume Agent:

We can start Flume agent with the below command by changing the appropriate values.

An agent is started using a shell script called flume-ng which is located in the bin directory of the Flume distribution. We need to specify the agent name, the config directory, and the config file on the command line:

The agent argument in the above command (flume-ng) tells Flume to start an agent, which is the generic name for a running Flume process.

Now the agent will start running source and sinks configured in the given properties file. The agent will start and no further output will appear on that screen.

Below is the screen shot of starting agent.

Flume Agent start1

Flume Agent start 2

In the above screen shot, we can observe the log messages stating Processing: logger-sink, Creating channel memory-channel, Channel memory-channel connected to [netcat-source, logger-sink] and finally at the bottom, Source starting and created serverSocket [/]. These messages are helpful while debugging if something goes wrong and the agent is not started.

In the above example, from the log messages we can confirm that agent Agent1 is started successfully and netcat-source listening for network traffic at port.

Step 3 – Feeding events to Flume Agent:

As the agent is started successfully, netcat-source is listening for events from external source at localhost:44444 port. So, lets provide the events to flume agent at specified port through curl or telnet utility

Below is the command to start curl utility with specified port number:

Once connected to port, give some sample lines of input. Here we have entered below lines input and hit enter after each line feed. Press ctrl+c to close curl connection.

Below is the screen shot of another terminal from which have given feed to netcat-source.

Flume Another Terminal

Step 4 – Verify the events at Agent Console:

Now we can verify the events sent through curl connection at agent console. Below is the screen shot of agent console.

Flume Capturing Data on Console

In the above screen shot, we can clearly view the events (Hello World !!!) received from netcat-source to agent console.

Step 5 – Stop Flume Agent:

Once the network event capturing is done, we can stop the agent by just pressing ctrl+c key.

Stop Agent

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “Flume Agent Setup – Netcat Source, Console Sink

  • Krunal

    Hi Siva,
    Thanks for sharing information. I just started with Flume and wanted to try ‘from scratch’.
    But I am stuck with an Issue.
    I think I have done proper installation and when running, Its getting connected to twitter but throwing exception as below.
    I have tried Coludera VM with my Twitter Keys/ Account seems ok. I observed both place I have different Java Version (Cloudera 1.6) and this ubuntu machine has 1.7. I alse changed Java to 1.6 suspecting SSL error due to security patches of Java 1.7.
    Could not find anything so far. Googling and Struggeling, If you have any Idea/ help to offer. Please reply. Thanks in Advance !

    — —————————————————————-
    2015-05-18 12:41:18,528 (Twitter4J Async Dispatcher[0]) [INFO – org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:178)] Processed 600 docs
    2015-05-18 12:41:22,292 (Twitter4J Async Dispatcher[0]) [INFO – org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:178)] Processed 700 docs
    2015-05-18 12:41:26,272 (Twitter4J Async Dispatcher[0]) [INFO – org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:178)] Processed 800 docs
    2015-05-18 12:41:27,538 (Twitter Stream consumer-1[Receiving stream]) [INFO – twitter4j.internal.logging.SLF4JLogger.info(SLF4JLogger.java:83)] Stream closed.
    2015-05-18 12:41:27,541 (Twitter Stream consumer-1[Receiving stream]) [ERROR – org.apache.flume.source.twitter.TwitterSource.onException(TwitterSource.java:331)] Exception while streaming tweets
    Stream closed.
    Relevant discussions can be found on the Internet at:
    http://www.google.co.jp/search?q=a8fd061d or
    TwitterException{exceptionCode=[a8fd061d-00070a0c a8fd061d-0007099a], statusCode=-1, message=null, code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.3}
    at twitter4j.StatusStreamBase.handleNextElement(StatusStreamBase.java:199)
    at twitter4j.StatusStreamImpl.next(StatusStreamImpl.java:57)
    at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:478)
    Caused by: javax.net.ssl.SSLException: SSL peer shut down incorrectly
    at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:408)

  • harikrishna

    I want to start the flume agent in my windows server, I have written the commands and error that I have got,

    1. Executing the following command from cmd :-
    C:\Program Files\apache-flume-1.6.0-bin>powershell.exe -NoProfile -InputFormat none -ExecutionPolicy unrestricted -File C:\apache-flume-1.6.0-bin\bin\flume-ng.ps1

    I am getting the following lines with a prompt command:

    cmdlet flume-ng.ps1 at command pipeline position 1
    Supply values for the following parameters:

    2. I am providing the following command
    command: bin\flume-ng agent -conf .\conf -n source_agent -f conf\flume.conf -H -p 41414 -Dflume.root.logger=DEBUG

    and I am getting the following error message :-

    C:\apache-flume-1.6.0-bin\bin\flume-ng.ps1 : Cannot validate argument on parameter
    ‘command’. The argument “bin\flume-ng agent -conf .\conf -n source_agent -f
    conf\flume.conf -H -p 41414 -Dflume.root.logger=DEBUG” does not
    belong to the set “help,agent,avro-client,version” specified by the ValidateSet
    attribute. Supply an argument that is in the set and then try the command
    + CategoryInfo : InvalidData: (:) [flume-ng.ps1], ParentContainsErrorRecordException
    + FullyQualifiedErrorId : ParameterArgumentValidationError,flume-ng.ps1

    Please help me to fix this issue, I am not able to start the flume agent because of this error.
    If I get the proper command to start the agent that will be great help for me.

    Thanks in advance.

Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017