HBase Shell Commands in Practice 6


In Our previous posts we have seen HBase Overview and HBase Installation, now it is the time to practice some Hbase Shell Commands to get familiarize with HBase. We will test a few Hbase shell commands in this post.

HBase Shell Usage

  • Quote all names in HBase Shell such as table and column names.
  • Commas delimit command parameters.
  • Type <RETURN> after entering a command to run it.
  • Dictionaries of configuration used in the creation and alteration of tables are Ruby Hashes. They look like this:

{‘key1’ => ‘value1’, ‘key2’ => ‘value2’, …}

and are opened and closed with curley-braces.

  • Key/values are delimited by the ‘=>’ character combination.
  • Usually keys are predefined constants such as NAME, VERSIONS, COMPRESSION, etc.
  • Constants do not need to be quoted. Type ‘Object.constants’ to see a (messy) list of all constants in the environment.
  • If you are using binary keys or values and need to enter them in the shell, use double-quote’d hexadecimal representation. For example:

The HBase shell is the (J)Ruby IRB with the below HBase-specific commands added.

HBase Shell Commands

HBase Shell Commands can be categorized into below types.

  • HBase Shell General Commands
  • Data Definition Commands
  • Data Manipulation Commands
  • Other HBase Shell Commands

General Commands

  • status – shows the cluster status
  • table_help – help on Table reference commands, scan, put, get, disable, drop etc.
  • version – displays HBase version
  • whoami – shows the current HBase user.

DDL Commands

alter

add/modify/delete column families, as well as change table configuration

Add/Change column family

For example, to change or add the ‘f1’ column family in table ‘t1’ from current value to keep a maximum of 5 cell VERSIONS, do:

we can operate on several column families:

Delete column family

To delete the ‘f1’ column family in table ‘ns1:t1’, use one of:

Alter Table Properties

We can also change table-scope attributes like MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc. These can be put at the end;
for example, to change the max size of a region to 128MB, do:

alter_async

Same as alter command but does not wait for all regions to receive the schema changes.

alter_status

Gets the status of the alter command. Indicates the number of regions of the table that have received the updated schema. Examples are

create

Used for Creating tables. Pass a table name, and a set of column family specifications (at least one), and, optionally, table configuration as arguments.

Examples:

1. Create a table with namespace=ns1 and table qualifier/name=t1

2. Create a table with namespace=default and table qualifier=t1

3. Table configuration options can be put at the end.

4. We can also keep around a reference to the created table.

Which gives a reference to the table named ‘t1’, on which we can then call methods t1.scan, t1.get.

describe

Prints the schema of a table. We can also use abbreviated ‘desc’ for the same thing.

disable

disable an existing HBase table.Disabled tables will not be deleted from HBase but they are not available for regular access. This table is excluded from the list command and we can not run any other command except either enable or drop commands on disabled tables. Disabling is similar to deleting the tables temporarily.

disable_all

Disable all of tables matching the given regex:

drop

Dropping of HBase tables means deleting the tables permanently. To drop a table it must first be disabled.

drop_all

Drop all of the tables matching the given regex:

enable

Used to enable a table which might be currently disabled.

enable_all

Enable all of the tables matching the given regex:

exists

To check the existence of an HBase Table

get_table

Gets the given table name and return it as an actual object to be manipulated by the user.

On reference to table we can perform all actions of a table like shown below.

We can call ‘put’ on the table: it puts a row ‘r’ with column family ‘cf’, column ‘q’ and value ‘v’ into table t1.

To read the data out, we can scan the table with below command which will read all the rows in table ‘t’.

Essentially, any command that takes a table name can also be done via table reference. Other commands include things like: get, delete, deleteall, get_all_columns, get_counter, count, incr. These functions, along with
the standard JRuby object methods are also available via tab completion.

We can also do general admin actions directly on a table; things like enable, disable, flush and drop just by typing:

is_disabled

To know whether an HBase table is disabled or not.

is_enabled

To know whether an HBase table is enabled or not.

list

List all tables in hbase. Optional regular expression parameter could be used to filter the output. Examples:

show_filters

Show all the filters in hbase. Example:

Namespace Commands

All below commands are self explanatory.

  • alter_namespace

Alter namespace properties.

To add/modify a property:

To delete a property:

  • create_namespace

Create namespace; pass namespace name, and optionally a dictionary of namespace configuration.
Examples:

  • describe_namespace

  • drop_namespace
  • list_namespace
  • list_namespace_tables

DML Commands

append

Appends a cell ‘value’ at specified table/row/column coordinates.

The same commands also can be run on a table reference. Suppose you had a reference
t to table ‘t1’, the corresponding command would be:

count

Count the number of rows in a table. Return value is the number of rows. This operation may take a LONG time (Run ‘$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount’ to run a counting mapreduce job). Current count is shown every 1000 rows by default. Count interval may be optionally specified. Scan caching is enabled on count scans by default. Default cache size is 10 rows. If our rows are small in size, you may want to increase this parameter. Examples:

The same commands also can be run on a table reference. Suppose you had a reference t to table ‘t1’, the corresponding commands would be:

delete

Put a delete cell value at specified table/row/column and optionally timestamp coordinates. Deletes must match the deleted cell’s coordinates exactly. When scanning, a delete cell suppresses older versions. To delete a cell from ‘t1’ at row ‘r1’ under column ‘c1’ marked with the time ‘ts1’, do:

deleteall

Delete all cells in a given row; pass a table name, row, and optionally a column and timestamp. Examples:

get

Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp, timerange and versions. Examples:

Besides the default ‘toStringBinary’ format, ‘get’ also supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the get specification. The FORMATTER can be stipulated:

1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
2. or as a custom class followed by method name: e.g. ‘c(MyFormatterClass).format’.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:

Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot specify a FORMATTER for all columns of a column family.

get_counter

Return a counter cell value at specified table/row/column coordinates.

incr

Increments a cell ‘value’ at specified table/row/column coordinates. To increment a cell value in table ‘ns1:t1’ or ‘t1’ at row ‘r1’ under column ‘c1’ by 1 (can be omitted) or 10 do:

put

Put a cell ‘value’ at specified table/row/column and optionally timestamp coordinates. To put a cell value into table ‘ns1:t1’ or ‘t1’ at row ‘r1’ under column ‘c1’ marked with the time ‘ts1’, do:

scan

Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS, CACHE

If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in ‘col_family:’.

The filter can be specified in two ways:
1. Using a filterString
2. Using the entire package name of the filter.

Some examples:

For setting the Operation Attributes

For experts, there is an additional option — CACHE_BLOCKS — which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples:

truncate

Disables, drops and recreates the specified table. After truncate of an HBase table, schema will be present but not the records.

truncate_preserve

Disables, drops and recreates the specified table while still maintaing the previous region boundaries.

Admin Commands

  • assign
  • balance_switch
  • balancer
  • catalogjanitor_enabled
  • catalogjanitor_run
  • catalogjanitor_switch
  • close_region
  • compact
  • flush
  • hlog_roll
  • major_compact
  • merge_region
  • move
  • split
  • trace
  • unassign
  • zk_dump

Replication Commands

  • add_peer
  • disable_peer
  • enable_peer
  • list_peers
  • list_replicated_tables
  • remove_peer
  • set_peer_tableCFs
  • show_peer_tableCFs

Snapshot Commands

  • clone_snapshot
  • delete_snapshot
  • list_snapshots
  • rename_snapshot
  • restore_snapshot
  • snapshot

Security Commands

grant

Grant users specific rights.

permissions is either zero or more letters from the set “RWXCA”.
READ(‘R’), WRITE(‘W’), EXEC(‘X’), CREATE(‘C’), ADMIN(‘A’)

Note: A namespace must always precede with ‘@’ character.

revoke

Revoke a user’s access rights.

user_permission

Show all permissions for the particular user.

Visibility labels Commands

  • add_labels
  • clear_auths
  • get_auths
  • set_auths
  • set_visibility

whoami

Show the current hbase user.

 


About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.


Leave a comment

Your email address will not be published. Required fields are marked *

6 thoughts on “HBase Shell Commands in Practice


Review Comments
default image

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

Dharmeswaran ETL / Hadoop Developer Spark Nov 2016 September 21, 2017

.