Apache Phoenix – An SQL Layer on HBase 5


Phoenix HBase Overview

What is Apache Phoenix?

Apache phoenix is another Top Level project from Apache Software Foundation. It provides an SQL interface to HBase. It is like an SQL layer on top of HBase architecture. It maps HBase data model to the relational world.

Phoenix is developed in java with a notion to put SQL back into NoSQL databases.

Need for Apache Phoenix:

Hive is added into Hadoop Eco-system to maintain and manage structured data in Hadoop and it also provide an SQL like dialect HiveQL to query the tables in Hive data warehouse. But hive doesn’t provide record level operations such as insert, update and delete operations.

To overcome this drawback of hive, HBase is brought into Hadoop eco-system to support record level operations. But HBase shell supports only HBase commands such as ‘scan’, ‘get’, ‘put’, ‘list’ to query entire table, to get a row, put a row and for listing tables etc. respectively. Thus HBase introduced its own set of commands and doesn’t support famous SQL interface.

Use of a well-understood language like SQL makes it easier for people to use HBase. Rather than learn another proprietary API, they can just use the language they’re used to to read and write their data.

To provide this advantage to HBase, Phoenix is introduced into Hadoop Eco System to provide an SQL Layer on top of HBase.

Phoenix provides very high performance when compared to Hive and Cloudera Impala or OpendTSDB.

Apache Phoenix Features:

  • It is delivered as embedded JDBC driver for HBase data.
  • Follows ANSI SQL standards whenever possible
  • Allows columns to be modeled as a multi-part row key or key/value cells.
  • Full query support with predicate push down and optimal scan key formation.
  • Versioned schema repository. Table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema.
  • DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns.
  • DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows.
  • Compiles SQL query into a series of HBase scans, and runs those scans in parallel to produce regular JDBC result sets.
  • It can seamlessly integrate with HBase, Pig, Flume and Sqoop.
Limitations:
  • Limited transaction support through client-side batching.
  • Joins are not completely supported. FULL OUTER JOIN and CROSS JOIN are not supported.

Why is Phoenix so Fast?

  • Apache Phoenix breaks up SQL queries into multiple HBase scans and runs them in parallel.
  • Phoenix is developed with notion of bringing the computation to the data by using:
    • Coprocessors to perform operations on the server-side thus minimizing client/server data transfer
    • Custom filters to prune data as close to the source as possible In addition, to minimize any start up cost.
    • Phoenix uses native HBase APIs rather than going through the mapreduce framework.
  • Phoenix leverages below HBase custom filters to provide higher performance.
    • Essential Column Family filter leads to improved performance when Phoenix query filters on data that is split in multiple column families (cf) by only loading essential cf. In second pass, all cf are loaded as needed.
    • Phoenix’s Skip Scan Filter leverages SEEK_NEXT_USING_HINT of HBase Filter. It significantly improves point queries over key columns.
    • Salting in Phoenix leads to both improved read and write performance by adding an extra hash byte at start of key and pre-splitting data in number of regions. This eliminates hot-spotting of single or few regions servers
  • In single line, we can say that Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.

For aggregate queries, co processors complete partial aggregation on their local region servers and  only returns relevant data to the client.

Phoenix creates secondary indexes to improve performance on non-row key columns.

Apache Phoenix Installation/Configuration:

Apache Phoenix installation on HBase is very simple and straight but we need to pick right version of Phoenix that matches with HBase version on our Hadoop cluster. Below is version compatibility matrix.

  • Phoenix 3.x – HBase 0.94.x
  • Phoenix 4.x – HBase 0.98.1+

In this post, we need it for HBase version 0.98.2, so we use phoenix 4.1.0. We need to perform below activities.

  • Download Phoenix binary tar ball from Apache download mirrors and extract it into our preferred installation directory (usually into /usr/lib/phoenix).
  • Copy phoenix-4.1.0-server.jar file into the HBase lib directory on each HBase region server machine. Terminal commands for performing these two actions.

  • Add Phoenix installation directory to .bashrc file. Add below environment variables by opening $ gedit ~/.bashrc.

Phoenix Installation

  • Restart all HBase region servers and reopen the terminal to pick .bashrc changes. Below are the commands to restart HBase in QuickStart VM

Now the installation is completed.

Installation of Phoenix on Cloudera QuickStart VM

By default Cloudera Quickstart VM ships with a lot of hadoop eco system tools installed and configured in pseudo mode but Apache Phoenix is not yet officially supported tool by cloudera by the time of writing this post. So we need to install Apache Phoenix manually on Cloudera QuickStart VM.

Cloudera-QuickStart-VM-5.3.0 best integrates with Phoenix-4.3.0 version. (Cloudera Quickstart VM 5.4.0 or 5.4.2  [Hbase version 1.0.0] tried with Apache Vanilla Phoenix 4.4.0 or 4.5.0 but there are lot of integrity issues)

But the successful combination is Cloudera-QuickStart-VM-5.3.0 (HBase-0.98.6) with Phoenix-4.3.0 but here we need to download the phoenix from cloudera labs.

Verify Phoenix Installation:

To connect to phoenix, we need to specify the zookeeper quorum node and in pseudo mode it is localhost.

A terminal interface to execute SQL from the command line is now bundled with Phoenix. To start it, execute the following:

Verify the tables and columns with !tables & !columns commands.

Sqlline

Press Ctrl+D to quit the phoenix session.

In Cloudera QuickStart VM,

Run Examples:

We can run some sample queries from .sql files present in $PHOENIX_HOME/../examples directory.

So, we have successfully configured phoenix and created sample table STOCK_SYMBOL and inserted rows into table and queried the values.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

5 thoughts on “Apache Phoenix – An SQL Layer on HBase

  • Arjun

    I am not able firm aggregation function on existing Hbase table from Phoenix

    DATA TYPES

    unsigned_int

    not able to map integer values in phoenix table

  • Shalmali Roy

    Hi Siva
    I am new to Hadoop and Hbase .I have created a standalone Hbase system and I am using phoenix4.1 .I am able to load data using bulk Upload from CSV file into database.I am not using hdfs here .However when I try to do an upsert query in DbVisulaizer it shows one row inserted but when I query the table I find nothing .I am not able to do a single line upsert using jdbc from java either although I can query data using jdbc from my code.
    I have 2 tables TEST.DRIVE_TEST_METADATA where I bulk uploaded CSV file and I am able to query this data using simple jdbc
    The second table TEST.FILE_TEMPLATE_CONFIG is a pre-populated table .When I try to do an upsert here it shows 1 row inserted but query shows empty table .

    One more issue is if I try to insert data using hbase shell into table TEST.FILE_TEMPLATE_CONFIG I am able to do so .But I am not able to view that data using Phoenix in DbVisualizer.This problem happens only with string fileds but not integer fields .

    I am using DbVisulaizer and I have copied phoenix-4.1client-hadoop2.jar in dbVisulaizer lib.The phoenix server4.1 is copied on hbase lib as well .


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.