We will discuss about collection of data into HBase directly through flume agent. In our previous posts under flume category, we have covered setup of flume agents for file roll, logger and HDFS sink types. In this, we are going to explore the details of HBase sink and its setup with live example.
As we have already covered File channel , Memory channel and JDBC Channel, so we will try to make use of Spillable memory channel in this agent setup to cover the usage of all flume supported channels. This does not mean that HBase sink needs only spillable memory channel and it equally works well with other channel types as well.
Even in source types, we have already covered Netcat Source, Exec Source, Avro Source, and Sequence Generator Source types, so we will try to explain one more source (Spooling directory source) in this agent setup.
Now lets create our agent Agent5 in flume.conf properties file under <FLUME_HOME/conf> directory.
Flume data collection into Hbase – Spooling Directory Source, HBase Sink and Spillable Memory channel:
flume.conf file creation:
Add the below configuration properties in flume.conf file to create Agent5 with Spooling Directory source, spillable memory channel and HBase Sink.
Configuration Before Agent Start up:
- Before starting this agent, we need to make sure below things are ready.
- Start Hadoop and Yarn daemons. Also start Hbase daemons. make sure all the daemons are started running properly otherwise we will enter into hell lot of issues. For any assistance of Hadoop installation and running daemons we can refer to our previous posts under Hadoop category and for the same on Hbase, refer to hbase installation post. Below commands will be helpful for performing these activities.
- In Hbase, Create the table with column family specified in flume.conf file.
Below is the screen shot of terminal for creation of hbase table through hbase shell after starting all daemons. In our agent, test_table and test_cf are table and column families respectively.
- Create the folder specified for spooling directory path, and make sure that flume user should have read+write+execute access to that folder. In our agent, it is /usr/lib/flume/spooldir directory.
- We will copy our input files into spool directory, from which flume will write each line as a new row into Hbase table. We will copy the input file wordcount.hql into spooling directory and below are the contents of wordcount.hql file.