HBase & Solr Search Integration

­HBase & Solr – Near Real time indexing and search Requirement: A. HBase Table B. Solr collection on HDFS C. Lily HBase Indexer. D. Morphline Configuration file Once Solr server ready then we are ready to configure our collection (in solr cloud); which will be link to HBase table. Add below properties to hbase-site.xml file. Add below properties to/etc/hbase-solr/conf/hbase-indexer-site.xml.  This will enable Lily indexer to reach HBase cluster for indexing. […]

HBase Functions Cheat Sheet 2

HBase Functions Cheat Sheet SHELL [cloudera@quickstart ~]$ hbase shell LIST hbase(main):003:0> list SCAN Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS, CACHE. If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in ‘col_family:’ hbase(main):012:0> scan ‘myFirstTable’ SCAN WITH FILTER hbase(main):079:0> scan ‘sales_fact’, { FILTER => “KeyOnlyFilter()”} –> […]

HBase Shell Commands in Practice

In Our previous posts we have seen HBase Overview and HBase Installation, now it is the time to practice some Hbase Shell Commands to get familiarize with HBase. We will test a few Hbase shell commands in this post. HBase Shell Usage Quote all names in HBase Shell such as table and column names. Commas delimit command parameters. Type <RETURN> after entering a command to run it. Dictionaries of configuration used […]

Apache Phoenix – An SQL Layer on HBase 5

Phoenix HBase Overview What is Apache Phoenix? Apache phoenix is another Top Level project from Apache Software Foundation. It provides an SQL interface to HBase. It is like an SQL layer on top of HBase architecture. It maps HBase data model to the relational world. Phoenix is developed in java with a notion to put SQL back into NoSQL databases. Need for Apache Phoenix: Hive is added into Hadoop Eco-system […]

HBase Integration with Hive 20

In this post, we will discuss about the setup needed for HBase Integration with Hive and we will test this integration with the creation of some test hbase tables from hive shell and populate the contents of it from another hive table and finally verify these contents in hbase table. Reasons to use Hive on HBase is that a lot of data sitting in HBase due to its usage in a […]

Data Collection from HTTP Client into HBase 7

This post provides a proof of concept of data collection from HTTP client into HBase. In this post, we will setup a flume agent with HTTP Source, JDBC Channel and AsyncHBase Sink. Initially we concentrate on POC of HTTP client data collection into HBase and at the end of this post we will go deep into details of each component used for setup of this agent. Now lets create our […]

Flume Data Collection into HBase 5

We will discuss about collection of data into HBase directly through flume agent. In our previous posts under flume category, we have covered setup of flume agents for file roll, logger and HDFS sink types. In this, we are going to explore the details of HBase sink and its setup with live example. As we have already covered File channel , Memory channel and JDBC Channel, so we will try to make […]

Hbase Daemons in Pseudo Distribution Mode

In Hbase cluster, we can start hbase daemons with start-hbase.sh command or

But in pseudo distribution mode (hbase.cluster.distributed=false), only HMaster daemon will be triggered but not the HRegionServer daemon or HQuorumPeer daemon. When we start the daemons with start-hbase.sh or individual hbase-daemon.sh commands for region server will not trigger daemon because of the below condition in start-hbase.sh script.

When we try to trigger the regionserver daemon through hbase-daemon.sh command we […]

Hbase Installation in Fully Distribution Mode 1

This post is a continuation for previous post on Hbase Installation. In the previous we have discussed about Hbase installation in pseudo distribution mode and in this post we will learn how to install and configure Hbase in fully distribution mode. Prerequisites:  JDK 1.6 or later versions of Java installed on each data node machine and Name Node. Hadoop 1 or Hadoop 2 Installed and Configured in fully distribution mode HBase […]

HBase Installation in Pseudo Distribution Mode 4

This post describes the procedure for HBase Installation on Ubuntu Machine in pseudo distributed mode using HDFS configuration. Prerequisites:  Java is one of the main prerequisite. JDK 1.6 or later versions of Java installation is required to run HBase. Hadoop 1 or Hadoop 2 installed on pseudo distributed or fully distributed cluster. HBase Installation Procedure: Follow below steps in the same order to complete the HBase Installation on Ubuntu machine. […]

HBase Overview

HBase is the Hadoop’s database and Below is the high level HBase Overview. HBase Overview: What is HBase ? HBase is a scalable distributed column oriented database built on top of Hadoop and HDFS. Apache HBase is open-source non-relational database implemented based on Google’s Big Table – A Distributed storage system for structured data. HBase provides random and real time read/write access to Big Data. Need For HBase: Although most […]

Review Comments
default gravatar

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA