HBase & Solr Search Integration


­HBase & Solr – Near Real time indexing and search

Requirement:

A. HBase Table

B. Solr collection on HDFS

C. Lily HBase Indexer.

D. Morphline Configuration file

Once Solr server ready then we are ready to configure our collection (in solr cloud); which will be link to HBase table.

  • Add below properties to hbase-site.xml file.
  • Add below properties to/etc/hbase-solr/conf/hbase-indexer-site.xml.  This will enable Lily indexer to reach HBase cluster for indexing. Replace your values for properties. Replace the hbase-cluster-zookeeper values as mentioned in hbase-site.xml, for local environment its value is localhost.

  • Restart below services

  • Create a hbase table with replication
Since the HBase Indexer works by acting as a Replication Sink, we need to make sure that Replication is enabled in HBase. You can activate replication using Cloudera Manager by clicking HBase Service->Configuration->Backup and ensuring “Enable HBase Replication” and “Enable Indexing” are both checked.
In addition, we have to make sure that the column family in the HBase table that needs to be replicated must have replication enabled. This can be done by ensuring that the REPLICATION_SCOPE flag is set while the column family is created, as shown below:

  • Create Solr cloud collection

Once you run above command get into path $HOME/hbase-collection1/conf in which there is solr config file; you can edit the schema.xml file

with our own schema, for this use case we have to add below tag which is column family of HBase (data).

  • Create a Solrcloud collection with the above schema.xml

Creating a Lily HBase Indexer configuration

Creating a Morphline Configuration File

Starting & Registering a Lily HBase Indexer configuration with the Lily HBase Indexer Service

 

  • Start the hbase-indexer:

  • Registering indexer

The Lily hbase indexer services provides a command line utility that can be used to add, list,  update and delete indexer configurations. The command shown below registers and adds a indexer configuration to the Hbase Indexer. This is done by passing an index configuration XML file also with the zookeeper ensemble information used for Hbase and SOLR and the solr collection

  • Verify that the indexer was successfully created as follows:

Verifying the indexing is working

Add rows to the indexed HBase table. For example:

If the put operation succeeds, wait a few seconds, then navigate to the Search in HUE UI, and query the data. Note the updated rows in Search.

Configuring Lily HBase NRT Indexer Service for Use with Cloudera Search

Using the Lily HBase NRT Indexer Service

Steps to build indexing

  1. solrctl instancedir –generate $HOME/hbase-collection4
  2. rm -rf /home/babusi02/hbase-collection5/conf/schema.xml
  3. rm -rf /home/babusi02/hbase-collection4/conf/solrconfig.xml
  4. cp /home/babusi02/hbase-collection2/conf/schema.xml /home/babusi02/hbase-collection5/conf/
  5. cp /home/babusi02/hbase-collection2/conf/solrconfig.xml /home/babusi02/hbase-collection4/conf/
  6. nano /home/babusi02/hbase-collection4/conf/schema.xml
  7. nano /home/babusi02/hbase-collection4/conf/solrconfig.xml
  8. solrctl instancedir –create hbase-collection4 $HOME/hbase-collection4
  9. solrctl collection –create hbase-collection4
  10. nano $HOME/morphline-hbase-mapper.xml
  11. nano /etc/hbase-solr/conf/morphlines.conf
  12. hbase-indexer add-indexer –name Indexer6 –indexer-conf $HOME/morphline-hbase-mapper.xml –connection-param
  13. solr.zk=dayrhegapd016.enterprisenet.org:2181,dayrhegapd015.enterprisenet.org:2181,dayrhegapd014.enterprisenet.org:2181,dayrhegapd020.enterprisenet.org:2181,dayrhegapd019.enterprisenet.org:2181/solr –connection-param solr.collection=hbase-collection6 –zookeeper dayrhegapd020.enterprisenet.org:2181
  14. hbase-indexer list-indexers

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.