Hive Authorization Models and Hive Security

In this post, we will discuss about Hive Authorization Models and Hive security.

Before discussing about Hive Authorization Models lets note the difference between authentication and authorization.

Authentication – Verifying the identity of the user, whether the logged in user is real user or not.

Authorization – Verifying whether a user has permission to perform a certain action.

Hive Authorization Models

In Hive, by default Authorization will not be enabled. But Hive provides three different types of Authorization models to enable security on the Hive data.

  • Hive Default Authorization

  • Hive – Storage Based Authorization (SBA)

  • Hive – SQL Standards Based Authorization (SSBA)

Hive Default Authorization

This was the only available authorization model till hive-0.10.0 release. In later releases, hive provided above mentioned other two models. This mode does not have a complete access control model, leaving many security gaps unaddressed.

To enable Hive Authorization, set the below properties in hive-site.xml to true.

This is bit similar to RDBMS style authorization model but any user can grant/revoke permissions to himself. Hive Authorization is defined at different levels.

  • Users

  • Groups

  • Roles

Here users and groups are same as users and group names in POSIX file system and Roles are nothing but names given to a set of grants/permissions. Roles can be assigned to users, groups, and other roles. Following rules were checked in the same sequence to authorize whether an user has permission to perform a Hive operation. If any one of the below checks has passed then hive operation will be performed.

  • User privileges (Has the privilege been granted to the user?)

  • Group privileges (Does the user belong to any groups that the privilege has been granted to?)

  • Role privileges (Does the user or any of the groups that the user belongs to have a role that grants the privilege?)

Since, authorization is controlled at user, group and role levels, we can optionally set below properties as well in hive-site.xml file. By default, the Metastore uses the HadoopDefaultAuthenticator.

Hive – Storage Based Authorization (SBA) at Metastore

In Storage based authorization, Hive uses the HDFS permissions for folders corresponding to the different metadata objects as the source of truth for the authorization policy. If we enable Storage Based Authorization in the metastore server, when any client tries to access metadata objects such as Databases, Tables and Partitions, it checks if client has permission on corresponding directories on the file system.

To enable Storage Based Authorization at metastore server security, set the below properties in hive-site.xml

 

Here, The DefaultHiveMetastoreAuthorizationProvider implements the standard Hive grant/revoke model. StorageBasedAuthorizationProvider uses HDFS permissions to provide authorization instead of using Hive-style grant-based authorization.

Note:

It is important to realize that Hive Metastore only controls authorization for metadata, and the underlying data is controlled by HDFS, so if permissions and privileges between the two systems are not in sync, users may have access to metadata, but not the physical data. If the user -> group mappings across the Metastore and NameNode are not in sync, a user may have the privileges required to access a table according to the Metastore, but may not have permission to access the underlying files according to the NameNode.

Hive -SQL Standard Based Authorization

It allows Hive to be fully SQL compliant in its authorization model. This can be used in conjunction with storage based authorization on the metastore server. Clients use SQL and ODBC/JDBC through HiveServer2 and their access can be controlled using this authorization model.

Restrictions in the Model:

  • Commands such as dfs, add, delete, compile, and reset are disabled when this authorization is enabled.

  • The set commands used to change Hive configuration are restricted to a smaller safe set. This is controlled using the hive.security.authorization.sqlstd.confwhitelist configuration parameter.

  • Privileges to add or drop functions and macros are restricted to the admin role.

  • The Hive transform clause is also disabled when this authorization is enabled.

To enable this authorization, add below properties in hive-site.xml.

Notes:

  • Note that a user who belongs to the admin role needs to run the “set role” command before getting the privileges of the admin role, as this role is not in current roles by default.

  • HiveServer2 can be configured to use embedded metastore, and that will allow it to invoke metastore authorization api

Put below configuration properties in hiveserver2-site.xml file

Privileges, Users and Roles:

  • Privileges can be granted to users as well as role

  • Users can belong to one or more roles.

The default role is public. All users belong to the public role. Only Admin role can create/drop/set/show roles. Users who do the work of a database administrator are expected to be added to the admin role. However, a user who belongs to the admin role needs to run the “set role” command before getting the privileges of the admin role, as this role is not in current roles by default.

The current roles can be seen using the “show current roles;” command.

Roles can be created/dropped/show/set with below commands. Only Admin has access for below commands

Setting role to ALL refreshes the list of current roles (in case new roles were granted to the user) and sets them to the default list of roles.

We can assign (grant) or remove (revoke) role from/to user, group or another role with below commands.

Privilege types:

  • ALL

  • ALTER

  • UPDATE

  • CREATE

  • DROP

  • INDEX

  • LOCK

  • SELECT

  • SHOW_DATABASE

Syntax for Grant/Revoke:

GRANT priv_type [(column_list)] [, priv_type [(column_list)]]

[ON object_type]

TO principle_specification [, principle_specification]

WITH GRANT OPTION ;

REVOKE priv_type [(column_list)] [, priv_type [(column_list)]]

[ON object_type priv_level]

FROM principle_specification [, principle_specification]

REVOKE ALL PRIVILEGES, GRANT OPTION FROM user, [user]

object_type : TABLE | DATABASE

priv_level : table_name | db_name

principle_specification : USER user | GROUP group | ROLE role

priv_type : ALL, ALTER, UPDATE, CREATE, DROP, INDEX, LOCK, SELECT, SHOW_DATABASE

Examples:

We can enable security on partition level as well by setting table properties PARTITION_LEVEL_PRIVILEGE to TRUE at the time of partitioned table creation like below.

For User hadoop1

For User user (Admin role)