In this post, we will discuss about Hive Authorization Models and Hive security.
Before discussing about Hive Authorization Models lets note the difference between authentication and authorization.
Authentication – Verifying the identity of the user, whether the logged in user is real user or not.
Authorization – Verifying whether a user has permission to perform a certain action.
Hive Authorization Models
In Hive, by default Authorization will not be enabled. But Hive provides three different types of Authorization models to enable security on the Hive data.
Hive Default Authorization
Hive – Storage Based Authorization (SBA)
Hive – SQL Standards Based Authorization (SSBA)
Hive Default Authorization
This was the only available authorization model till hive-0.10.0 release. In later releases, hive provided above mentioned other two models. This mode does not have a complete access control model, leaving many security gaps unaddressed.
To enable Hive Authorization, set the below properties in hive-site.xml to true.
This is bit similar to RDBMS style authorization model but any user can grant/revoke permissions to himself. Hive Authorization is defined at different levels.
Here users and groups are same as users and group names in POSIX file system and Roles are nothing but names given to a set of grants/permissions. Roles can be assigned to users, groups, and other roles. Following rules were checked in the same sequence to authorize whether an user has permission to perform a Hive operation. If any one of the below checks has passed then hive operation will be performed.
User privileges (Has the privilege been granted to the user?)
Group privileges (Does the user belong to any groups that the privilege has been granted to?)
Role privileges (Does the user or any of the groups that the user belongs to have a role that grants the privilege?)
Since, authorization is controlled at user, group and role levels, we can optionally set below properties as well in hive-site.xml file. By default, the Metastore uses the HadoopDefaultAuthenticator.
Hive – Storage Based Authorization (SBA) at Metastore
In Storage based authorization, Hive uses the HDFS permissions for folders corresponding to the different metadata objects as the source of truth for the authorization policy. If we enable Storage Based Authorization in the metastore server, when any client tries to access metadata objects such as Databases, Tables and Partitions, it checks if client has permission on corresponding directories on the file system.
To enable Storage Based Authorization at metastore server security, set the below properties in hive-site.xml
Here, The DefaultHiveMetastoreAuthorizationProvider implements the standard Hive grant/revoke model.
StorageBasedAuthorizationProvider uses HDFS permissions to provide authorization instead of using Hive-style grant-based authorization.
It is important to realize that Hive Metastore only controls authorization for metadata, and the underlying data is controlled by HDFS, so if permissions and privileges between the two systems are not in sync, users may have access to metadata, but not the physical data. If the user -> group mappings across the Metastore and NameNode are not in sync, a user may have the privileges required to access a table according to the Metastore, but may not have permission to access the underlying files according to the NameNode.
Hive -SQL Standard Based Authorization
It allows Hive to be fully SQL compliant in its authorization model. This can be used in conjunction with storage based authorization on the metastore server. Clients use SQL and ODBC/JDBC through HiveServer2 and their access can be controlled using this authorization model.
Restrictions in the Model:
Commands such as dfs, add, delete, compile, and reset are disabled when this authorization is enabled.
The set commands used to change Hive configuration are restricted to a smaller safe set. This is controlled using the hive.security.authorization.sqlstd.confwhitelist configuration parameter.
Privileges to add or drop functions and macros are restricted to the admin role.
The Hive transform clause is also disabled when this authorization is enabled.
To enable this authorization, add below properties in hive-site.xml.
Note that a user who belongs to the admin role needs to run the “
set role” command before getting the privileges of the admin role, as this role is not in current roles by default.
HiveServer2 can be configured to use embedded metastore, and that will allow it to invoke metastore authorization api
Put below configuration properties in hiveserver2-site.xml file