Data Collection from HTTP Client into HBase 7

This post provides a proof of concept of data collection from HTTP client into HBase. In this post, we will setup a flume agent with HTTP Source, JDBC Channel and AsyncHBase Sink.

Initially we concentrate on POC of HTTP client data collection into HBase and at the end of this post we will go deep into details of each component used for setup of this agent.

Now lets create our agent Agent6 in flume.conf properties file under<FLUME_HOME/conf> directory.

Data collection from HTTP client into HBase – Flume Agent – HTTP Source, AsyncHBase and JDBC Channel:

Add the below configuration properties in flume.conf file to create Agent6 with HTTP source, JDBC channel and AsyncHBase Sink.

Configuration Before Agent Start up:
  • Start Hadoop and Yarn daemons. Also start Hbase daemons. Make sure all the daemons are started successfully.
  • In Hbase, Create the table with column family specified in flume.conf file.

Below is the screen shot from terminal performing the above activities.

Start of all daemons

Hbase Table creation2

  • Create HTTP Client to post our input file to HTTP Source at the configured hostname and port number.

Flume HTTP Source’s default handler is org.apache.flume.source.http.JSONHandler. We need to create input for this handler in JSON format as shown below. Further details on this handler and supported formats will be discussed at the bottom of this post. As of now, lets consider the below lines of text into input file.

For creation of HTTP Client, we have below java code. This application can send a JSON document to a remote web server using HTTP POST.