0

I am trying to secure Hive using storage based security. I am using Kerberos and LDAP.

What I am trying to get is Hive to create directories and files as user (and their main group) in HDFS. This way I hope to restrict access to databases based on group membership.

So for example when I am authenticated as user 'import' (in group 'imports') with Kerberos using kinit (import@REALM) and run beeline 'CREATE DATABASE test;' I expect to see:

drwxr-x---   - import     imports             0 2015-08-28 10:16 /user/hive/warehouse/test.db

But what I am getting is:

drwxr-x---   - hive       data                0 2015-08-28 10:16 /user/hive/warehouse/test.db

Note that the warehouse directory permission is:

drwxrwxr-t   - hive       data                0 2015-08-28 11:14 /user/hive/warehouse

Also I have noticed that when I change the ownership manually using hadoop fs -chown I can still drop databases that are not owned by me! Also when I use hadoop put I get correct permissions.

Is this possible in Hive at all?

My current config is:

core-site.xml:

<property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
</property>
<property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
</property>

<property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>localhost,master.dev.data</value>
</property>
<property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
</property>

hdfs-site.xml:

<property>
    <name>fs.permissions.umask-mode</name>
    <value>027</value>
</property>

hive-site.xml:

<property>
    <name>hive.warehouse.subdir.inherit.perms</name>
    <description>true if table directories should inherit the permissions of the warehouse or database directory instead of being created with permissions derived from dfs umask</description>
    <value>false</value>
</property>

<property>
    <name>hive.metastore.pre.event.listeners</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
</property>

<property>
    <name>hive.security.metastore.authenticator.manager</name>
    <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
</property>

<property>
    <name>hive.security.metastore.authorization.manager</name>
    <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
</property>

<property>
    <name>hive.security.metastore.authorization.auth.reads</name>
    <value>true</value>
</property>

<property>
    <name>hive.metastore.execute.setugi</name>
    <value>true</value>
</property>

<property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
</property>

I have omitted Kerberos keytab/principal configs but Hive Server2 and Metastore are using 'hive/master.host@REALM' principal and Yarn/HDFS are Kerberos enabled as well; all nodes get passwd/group from LDAP.

Versions:

hadoop-0.20-mapreduce-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-client-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-hdfs-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-hdfs-namenode-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-mapreduce-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-mapreduce-historyserver-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-yarn-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hadoop-yarn-resourcemanager-2.6.0+cdh5.4.4+597-1.cdh5.4.4.p0.6.el6.x86_64
hive-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
hive-jdbc-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
hive-metastore-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
hive-server2-1.1.0+cdh5.4.4+157-1.cdh5.4.4.p0.6.el6.noarch
user16611
  • 101
  • 3
  • It looks like the map-reduce job is actually impersonated. If I disable doAs any query that requires running map-reduce on the data will fail due to hive user not being whitelisted on the Yarn container - this means that the jobs are normally running as correct user. But still all the non map-reduce operation (CREATE TABLE/DATABASE) will be run as hive user which creates permission issues since the data now will be owned by someone else. – user16611 Sep 03 '15 at 11:17

1 Answers1

1

Can you try starting hiveserver2 with embedded metastore ?

hiveserver2 -hiveconf hive.metastore.uris=' ' ..

That might be a workaround, and it would tell if this is a bug only with remote metastore mode.

(For historical reasons, at Hortonworks we have stuck to using metastore in embedded mode with HS2 and our system tests are run in that mode. I haven't seen this issue in that mode).

chicks
  • 3,793
  • 10
  • 27
  • 36