3

I call kinit keytab right before spark-submit in my shell driver script. The thing is, its working by itself, but when I call the shell driver scrip it through Oozie, I got this error:

Stdoutput py4j.protocol.Py4JJavaError: An error occurred while calling 
o49.saveAsTextFile.
Stdoutput : org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
Delegation Token can be issued only with kerberos or web authentication

The issue is probably here

file.coalesce(1,True).saveAsTextFile(FQDNofHadoop+output) 

EDIT: In my script is: kinit -k -t /home/me/me.keytab me@DOMAIN.HAD

EDIT: Working solution:

I used

spark-submit --principal 'me@DOMAIN.HAD' \ --keytab '/home/me/me.keytab' \ 

and execution of pyspark script in oozie worked with no error (even writing into hive table and so on). Log4j logger didn't work (with standalone scripts it does), but at least print() (stdout into yarn logs) yes ...

Thx

la_femme_it
  • 632
  • 10
  • 24
  • 1
    Check to make sure your shell driver script knows the absolute path to the keytab. – T-Heron Jul 28 '17 at 16:43
  • In my script is: kinit -k -t /home/me/me.keytab me@DOMAIN.HAD – la_femme_it Jul 28 '17 at 17:01
  • Well, it was worth a shot. Try this: Use 'UserGroupInformation.loginUserFromKeytab' inside your class (or run 'kinit -kt' command at the start of shell script). – T-Heron Jul 28 '17 at 17:11
  • Actually I did, I run the kinit -kt and also the tgt should be on all worker nodes... – la_femme_it Jul 28 '17 at 17:13
  • 1
    _"the TGT should be on all worker nodes"_ >> what does that mean?!? You can upload the *keytab* to HDFS and have Oozie download it at run-time using a `` instruction; then you can `kinit`to create a **local** ticket in the server-wide cache for your user -- or a local ticket in a private cache (using `kinit` w/ custom `KRB5CCNAME`) -- or a volatile session ticket using either Spark `--principal / --keytab` or custom code invoking Hadoop `UserGroupInformation` – Samson Scharfrichter Jul 28 '17 at 21:17
  • 1
    But note that `UserGroupInformation` would work for code running in the driver, not in the executors, because default authentication tokens are created & broadcasted *before* the driver is started, hence before you can execute custom code. – Samson Scharfrichter Jul 28 '17 at 21:20
  • I used --principal 'me@DOMAIN.HAD' \ --keytab '/home/me/me.keytab' \ and execution of pyspark script worked with no error (even writing into hive table and so on). Logger didn't work, but at least print() yes ... – la_femme_it Aug 01 '17 at 14:43
  • @la_femme_it - You can make your own answer using the working solution you added. This way the question is clearly marked as answered to better help others just scanning through the Summaries only. – T-Heron Nov 20 '17 at 15:30

1 Answers1

2

I used --principal 'me@DOMAIN.HAD' \ --keytab '/home/me/me.keytab' \ and execution of pyspark script worked with no error (even writing into hive table and so on). Logger didn't work, but at least print() yes ...

la_femme_it
  • 632
  • 10
  • 24