2

When I try to write a dataframe obtained by querying hive using hive context in oozie I get the below exception. What could be issue

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenExchange hashpartitioning
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
    at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7496)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:548)
    at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getDelegationToken(AuthorizationProviderProxyClientProtocol.java:663)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:981)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
techie
  • 313
  • 1
  • 8
  • 23
  • Authentification error: `Token can be issued only with kerberos or web authentication` Is the user that runs the Spark program allowed to access the HDFS? – UninformedUser Jun 05 '17 at 19:46
  • @AKSW: yes I am able to run program from spark-submit. This error comes only when run from oozie. I am also setting hive credentials in action of oozie – techie Jun 05 '17 at 20:08
  • The "hive credentials" are fine if you run a Hive action. But Spark needs a **real Kerberos ticket** to create its own HDFS/YARN delegation token, its own Hive token, its own HBase token if needed, etc. I guess you have to use `--principal` and `--keytab` arguments in the Spark command-line, and pass the keytab with a `` element in the action script. – Samson Scharfrichter Jun 05 '17 at 20:56
  • To be more specific: the Spark tasks done in the driver (and in the executors if run in `local` mode) may be OK with whatever credentials/tokens are provided by Oozie; but if Spark runs in `yarn-client` mode and spawns its executors in a different YARN job, then it needs a full-fledged Kerberos ticket. – Samson Scharfrichter Jun 05 '17 at 21:03
  • @SamsonScharfrichter: The keytab exists in local directory and not in hdfs. will that create problem when using file element – techie Jun 06 '17 at 07:03
  • That's a problem easily solved - just upload the keytab to HDFS, then **immediately restrict access to that file** e.g. `hdfs dfs -chmod g-rwx,o-rwx /path/to/xyz.keytab` then `hdfs dfs -setfacl -m user:spark-service-account:rw- /path/to/xyz.keytab` – Samson Scharfrichter Jun 06 '17 at 09:05
  • @SamsonScharfrichter, but I dont have privelege to copy keytab file. Is there any other approach – techie Jun 06 '17 at 15:55
  • This is quite puzzling: to submit an Oozie workflow, you must upload the XML containing the workflow definition to HDFS first. Therefore if you don't have any privileges on HDFS, then you cannot submit Oozie workflows, you are not a developer, problem solved. – Samson Scharfrichter Jun 06 '17 at 17:15
  • Hi @SamsonScharfrichter can you please look in this problem, I am trying to get answer but no luck. https://stackoverflow.com/questions/70327262/getting-delegation-token-can-be-issued-only-with-kerberos-or-web-authentication – MD Rijwan Dec 12 '21 at 20:25

2 Answers2

1

This is because Oozie already obtained Delegation tokens before launching the Spark action.

The solution is to ask Spark not to obtain Delegation token again by adding below into spark action in workflow.xml:

<spark-opts>--conf spark.yarn.security.tokens.hive.enabled=false</spark-opts>
Eric Lin
  • 1,440
  • 6
  • 9
  • Hi @EricLin for executing hive through shell what changes we can do can you please look in this problem https://stackoverflow.com/questions/70327262/getting-delegation-token-can-be-issued-only-with-kerberos-or-web-authentication – MD Rijwan Dec 12 '21 at 20:27
  • Sorry, I have not worked with Hive and Oozie for a long time, and my knowledge is out of date already. – Eric Lin Dec 15 '21 at 00:28
0

The analysis above is correct, however the solution did not work for me. Instead, it is also possible to tell spark to ignore the tokens that were already requested by oozie, and that did it for me:

--conf spark.yarn.security.tokens.hadoopfs.enabled=false
--conf spark.yarn.security.credentials.hadoopfs.enabled=false
  • Hey Frank, can you please help in this : https://stackoverflow.com/questions/70327262/getting-delegation-token-can-be-issued-only-with-kerberos-or-web-authentication – MD Rijwan Dec 12 '21 at 20:28