1

I am trying to create hive tables as outputs of my spark (1.5.1 version) job on a hadoop cluster (BigInsight 4.1 distribution) and am facing permission issues. My guess is spark is using a default user (in this case 'yarn' and not the job submitter's username) to create the tables and therefore fails to do so.

I tried to customize the hive-site.xml file to set an authenticated user that has permissions to create hive tables, but that didn't work.

I also tried to set Hadoop user variable to an authenticated user but it didn't work either.

I want to avoid saving txt files and then creating hive tables to optimize performances and reduce the size of the outputs through orc compression.

My questions are :

  • Is there any way to call write function of the spark dataframe api with a specified user ?
  • Is it possible to choose a username using oozie's workflow file ?
  • Does anyone have an alternative idea or has ever faced this problem ?

Thanks. Hatak!

Sahil Desai
  • 3,418
  • 4
  • 20
  • 41
Hatak
  • 53
  • 1
  • 6
  • Yes, oozie allows you to set the username, however, the hive-site should be setup to support user impersonation. – OneCricketeer Oct 26 '17 at 19:43
  • Thanks a lot. Based on your answer, I found that the property "hive.server2.enable.doAs" has to be set to true in the hive-site.xml. But I think this has to be done on the xml located on the cluster, not the one embedded in my java classpath, right ? – Hatak Oct 26 '17 at 20:39
  • As far as I know, it needs to be set at both places. One tells the server to pass though the user information to the disk. The other tells the client to pass it to the server. The default value is true, by the way – OneCricketeer Oct 27 '17 at 00:15

1 Answers1

0

Consider df holding your data, you can write

In Java:

df.write().saveAsTable("tableName");

You can use different SaveMode like Overwrite, Append

df.write().mode(SaveMode.Append).saveAsTable("tableName");

In Scala:

df.write.mode(SaveMode.Append).saveAsTable(tableName)

A lot of other options can be specified depending on what type you would like to save. Txt, ORC (with buckets), JSON.

Vinay Limbare
  • 151
  • 2
  • 16
  • Thank you for your reply. I already know how to save a dataframe as a hive Table, but my problem is rather related to permission. My job does not have the rights to write on the hive metastore and I am trying to find a solution to that. – Hatak Oct 26 '17 at 20:35