I am trying to create hive tables as outputs of my spark (1.5.1 version) job on a hadoop cluster (BigInsight 4.1 distribution) and am facing permission issues. My guess is spark is using a default user (in this case 'yarn' and not the job submitter's username) to create the tables and therefore fails to do so.
I tried to customize the hive-site.xml file to set an authenticated user that has permissions to create hive tables, but that didn't work.
I also tried to set Hadoop user variable to an authenticated user but it didn't work either.
I want to avoid saving txt files and then creating hive tables to optimize performances and reduce the size of the outputs through orc compression.
My questions are :
- Is there any way to call write function of the spark dataframe api with a specified user ?
- Is it possible to choose a username using oozie's workflow file ?
- Does anyone have an alternative idea or has ever faced this problem ?
Thanks. Hatak!