0

Right now I can submit Spark jobs over Livy with the spark submit command, and in the command there is a --proxy-user livy parameter so Livy can impersonate spark and run the spark submit. However, I want to know how to do this without having the --proxy-user parameter because the company has thousands of Hadoop nodes and we are not allowed to create new users on them (for testing we had to create livy user adduser livy on all the worker nodes for the proxy-user parameter to work, otherwise it would error about livy user not found.

I am currently submitting my spark job via Livy through RESTapi POST call according to this documentation: https://livy.incubator.apache.org/docs/latest/rest-api.html, and in the POST section of the documentation, we can see that there is a proxyUser parameter, and it is creating the --proxy-user parameter in the spark submit command. I need a way to not use this proxyUser but I cannot find a way to disable it. If I don't specify the proxyUser parameter in the POST call, it automatically generates it in the spark submit command anyway as --proxy-user livy but I don't want that.

Does anyone know how to disable the proxyUser parameter?

JYCH
  • 61
  • 7
  • _"Livy can impersonate spark"_ > no! Livy can impersonate **the end user (or service account)** that submitted the job. – Samson Scharfrichter Feb 07 '20 at 15:50
  • _"we are not allowed to create new users ... `adduser livy` on all the worker nodes"_ > that's usually the job of Cloudera Manager / Ambari to create the local Linux accounts for Hadoop services (and make sure their Kerberos creds are mapped to these Linux accounts). – Samson Scharfrichter Feb 07 '20 at 15:57
  • If you set `livy.impersonation.enabled =false` in _livy.conf_ then all jobs will be submitted as the `livy` service account. But that completely defeats the purpose of Kerberos authentication. – Samson Scharfrichter Feb 07 '20 at 16:00
  • Hi @SamsonScharfrichter, sorry I'm still new to Kerberos and Hadoop so my questions may have been confusing. After the last few days here is my newest update and question: right now Kerberos and Livy and Spark submit all work fine, but this requires me to have the Livy linux accounts made on the workers. However, this is not allowed because I'm installing my software+platform for another org who doesn't allow us root access (so no creating accounts), this also means I am not allowed to change any Livy configs. With these constraints, how do I get Livy to get past Kerberos for spark submit? – JYCH Feb 07 '20 at 17:46
  • I tried the ```livy.impersonation.enabled = false``` settings and it does get rid of the proxyUser but as you say that defeats the purpose of Kerberos – JYCH Feb 07 '20 at 17:47
  • _"installing my software+platform for another org who doesn't allow us root access"_ > then you present them **your** requirements i.e. have the `livy` account created on master/worker nodes -- and the node(s) actually running Livy; have a Kerberos SPN provisioned for each of your Livy instance(s) and mapped to the `livy` Linux acccount (the default Hadoop `auth_to_local` should be enough); have that account defined as a "proxyuser" for Hadoop core services. Let them manage it their own way. – Samson Scharfrichter Feb 08 '20 at 18:31

0 Answers0