0

I am trying to launch my spark batch job using livy. From the logs , i see that the start running but fails when it tries to access hive metastore with the following kerberos error:

GSSException: No valid credentials provided (Mechanism level: Failed to find any kerberos tgt)

The same job runs fine when i launch it using a spark-submit command. However in the spark-submit command i pass the keytab and principal (--keytab, --principal).

I tried passing the keytab and principal in the livy rest call using the parameters spark.yarn.keytab and spark.yarn.principal. adding these options throw the following error:

Error: only one of --proxy-user or --principal can be provided

even though I do not provide proxyUser parameter in my curl request.

kindly let me know if you know how to resolve this issue

smang
  • 95
  • 1
  • 8
  • Livy uses **impersonation** >> you authenticate against the service, then the service authenticates against Hadoop using _its own Kerberos creds_ -- if `livy` is properly defined as "proxy account" in `core-site.xml` used server-side then it has the privilege to launch jobs under your account without presenting your creds. Just like Oozie, HiveServer2, Knox, etc – Samson Scharfrichter May 17 '21 at 20:47
  • Except Liny does not run jobs directly on Hadoop; it starts a Spark job, and Spark accesses Hadoop resources. Hence Spark gets a property meaning "you run under a privileged (proxy) account, you must impersonate user X". That's what `--proxy-user` is about. – Samson Scharfrichter May 17 '21 at 20:51
  • Bottom line: either `livy` is not properly defined as "proxy account" for Hive MetaStore and the Spark jobs fails to impersonate you. Or the Livy / Spark configuration has a bad property, and Spark does not generate a **delegation token** for Metastore on startup. – Samson Scharfrichter May 17 '21 at 20:54
  • Kerberos auth does not work for distributed systems -- actually it's a very lame fit for Hadoop; Kerberos is used once per job X service, to obtain a "delegation token" valid from all worker nodes (Edge Nodes and YARN containers alike) to all service nodes. Spark gets its token at startup, then pushes the token to the driver then to all executors. But only HDFS/YARN token is mandatory, the others (Hive Metastore, HBase etc) are optional, and there is no trace of the token in the logs by default. – Samson Scharfrichter May 17 '21 at 20:59
  • TL;DR Check that `spark.security.credentials.hive.enabled` is True on the Livy server. And check that `livy` is defined as "proxy-account" for the Hive Metastore. – Samson Scharfrichter May 17 '21 at 21:06

1 Answers1

-1

We can pass keytab and principal in our livy json. Please find the passing parameters below.

"livy.server.auth.type":"kerberos",
"livy.server.auth.kerberos.keytab":"<hdfs_location>/<keytab_file>",
"livy.server.auth.kerberos.principal":"<Principal_name>",
"livy.server.launch.kerberos.keytab":"<hdfs_location>/<keytab_file>",
"livy.server.launch.kerberos.principal":"<Principal_name>"

It will help us to avoid token expiry issues in streaming job.

Charan HS
  • 1
  • 1