Hive Interactive Shell Commands


In our previous post, we have discussed about Hive CLI commands and now we will focus on continuation for the same topic with Hive Interactive Shell Commands and a few examples on these options.

Hive Interactive Shell Commands

By default Hive enters into Interactive shell mode, if we do not use -e or -f options. Unlike batch commands, interactive shell commands must be ended with semicolon (;).

Below table lists the interactive shell commands and short descriptions for each command.

Command Description
quit or exit Use quit or exit to leave the interactive shell.
 set key=value Set value of a configuration property/variable.
 set This will print all configuration variables if used without a property argument.
 set -v This will print all hadoop and hive configuration variables. Same as Set Command without arg.
 reset This will reset all the configuration properties to default, even if we provide any property argument.
add FILE[S] <file> <file>*
add JAR[S] <file> <file>*
add ARCHIVE[S] <file> <file>*
Adds a file(s)/jar(s)/archives to the hive distributed cache.
list FILE[S]
list JAR[S]
list ARCHIVE[S]
list all the files added to the distributed cache.
list FILE[S] <file>*
list JAR[S] <file>*
list ARCHIVE[S] <file>*
Check if given resources are already added to distributed cache.
delete FILE[S] <file>*
delete JAR[S] <file>*
delete ARCHIVE[S] <file>*
Removes the resource(s) from the distributed cache.
 ! <cmd> Executes a shell command from the hive shell
 dfs Executes a dfs command from the hive shell
 <query> Executes a hive query and prints results to standard out
 source FILE <file> Used to execute a script file inside the CLI.

Hive SET Command

As seen in the previous post, we can define hive configuration properties or variables for a session with hive –define or hive –hivevar or hive –hiveconf options before entering into hive session.

But Hive SET command provides a way to override any configuration property that was set already or any other three types of variables in Hive (System variables, Hadoop Configuration Properties & Hive Configuration Properties) but Environment variables can’t be overridden by Set Command.

We can also display all the configuration properties of Hive and Hadoop, Including system and environment variable with SET Command. We can also check the value of each property separately with Set command.

Override configuration properties with SET command

Below are some Sample SET commands to override conf properties. Note that we are able change system variables including hadoop and hive conf properties but not able to change env variables in the below code.

View Configuration Property Values with SET command

After some time during the session we may need to check the values of specific configuration properties and this can be done as shown below with SET command.

But there is no way that we can display all the changed/overridden configuration properties by user with single command.

List all Conf Properties and Variables with SET command

If we do not pass any argument or if we provide -v option to SET command we can list all the variables used in the current Hive Session.

This will list down all

  • System Variables
  • Environment Variables
  • Hadoop Configuration Properties
  • Hive Configuration Properties
  • Hive Variables defined by –define, –hivevar options

Below is the sample output of SET -v command. Generally It is very big list and can’t be viewed in a single terminal, it is better to redirect the output to a file on Local FS and can be viewed from file. For Better readability we have omitted some of the output lines from cat command. Note that above overridden properties are reflected in the below list.

Hive RESET Command

In Hive session, we can reset all the hadoop and hive variables, configuration properties to default values with single command RESET.

  • It has no difference even if we provide any specific property as argument to RESET command. Thus, we cannot reset a single/specific property value default in Hive, we can only reset all properties to default values at a time.
  • Hive’s RESET cannot reset the system variables.

Below are the sample command sequence to show RESET functionality.

Hive Resources to Distributed Cache

In Hive we can add/delete any resources to/from Hive Session, which will internally stores these files in Hadoop’s Distributed Cache. The resources can be files, jars, or archives. Any locally accessible file can be added to the session and these will be available at query execution time.

Below is the syntax for add/list/delete resources to Hive Session:

Once a resource is added to a session, Hive queries can refer to it by its name (in map/ reduce/ transform clauses) and the resource is available locally at execution time on the entire Hadoop cluster.

  • FILE resources are just added to the distributed cache.
  • JAR resources are also added to the Java classpath, in addition to distributed cache. This is required in order to reference objects they contain such as UDFs.
  • ARCHIVE resources (ZIP files, tar files, and gzipped tar files) are automatically unarchived as part of distributing them.
Examples

Below are sample examples of add/list/delete hive resources.

Alternative Method to Add to Distributed Cache

If there are any resources like JAR files which need to be added for each Hive session, then, Instead of adding these files every time with ADD JAR command we can include them hive-env.sh file so that whenever we start hive session then, these will get included into Hive session automatically.

We need to add HIVE_AUX_JARS_PATH in hive-env.sh file under HIVE_CONF_DIR location, as shown below.

Or we can also achieve the same by adding hive.aux.jars.path property to hive-site.xml file under HIVE_CONF_DIR location, as shown below.

But with this method  we can only add JARs but not files or archives.

Running Hadoop FS Commands from Hive Shell

Hive Shell Provides the ability to run Hadoop FS commands within Hive Shell itself, So that whenever we need to refer or view any input/output files/directories in HDFS then instead of coming out from Hive shell to perform HDFS FS commands, we can do it within Hive Shell itself with dfs command.

This method of accessing hadoop commands is more efficient than using the hadoop fs commands at the bash shell, because the latter starts up a new JVM instance each time, whereas Hive just runs the same code in its current process.

Running Linux Shell Commands from Hive Shell

In addition of running Hadoop FS commands from Hive Shell, We can also run Linux bash shell commands within Hive shell itself. We need to type ! followed by the command and terminate the line with a semicolon ( ; ).

Limitations
  • Interactive commands that require user input, can’t be issued in Hive shell.
  • Passing output from one command to another through pipes (|) are not supported.
  • File globs are also not supported. For example, ! ls *.hql; will look for a file named *.hql;,
    rather than all files that end with the .hql extension

Customizing Hive Logging Level

Hive uses log4j for logging. By default, Hive logs are not emitted to the standard output (console), but Hive’s error log will be created on the local file system at /tmp/$USER/hive.log . This can be controlled by the properties hive-log4j.default  in the HIVE_CONF_DIR directory. Below are the configuration properties that result in creation of hive log files.

Here hive.root.logger specifies the logging level (INFO) as well as the log destination (DRFA – Daily Rolling File Appender). If destination is DRFA then all logs will be appended to a log file on given path.

But if want to emit the logs to the standard output and/or change the logging level, then we need to rename the HIVE_CONF_DIR/hive-log4j.properties.template file to HIVE_CONF_DIR/hive-log4j.properties by removing .template suffix and override the above properties to our wish.

If we need to emit the logs on console, we can update the property as below

With this, we can change the logging level for all the new hive sessions.

We can even change logging level just for one session. As we have seen that we can override Hive Configuration properties with –hiveconf or SET commands in Hive, we can use them as given below to set console as logging destination and WARN as the logging level.

Hive Shell’s Auto Complete Feature

Similar to Linux terminal, Even Hive shell supports Auto Completion of keywords or function names, by start typing and hit the Tab key. For example, if we type SELE and then the Tab key, the CLI will
complete the word SELECT .

But If we type the Tab key at the prompt without typing anything, we’ll get response as below:

If we enter Y , we’ll get a long list of all the keywords and built-in functions.

Hive Shell Command History

Hive shell saves the last 10,000 lines into a file $HOME/.hivehistory on local FS. We can use the up and down arrow keys to scroll through previous commands. But each previous line of input is shown separately and thus the CLI does not combine multi-line commands and queries into a single history entry.


Profile photo of Siva

About Siva

Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements. Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.

Leave a comment

Your email address will not be published. Required fields are marked *


Review Comments
default image

I am a plsql developer. Intrested to move into bigdata.

Neetika Singh ITA Hadoop in Dec/2016 December 22, 2016

.