5

I'm able to connect to non-Kerberized spark cluster through Livy service without problems from a remote Rstudio desktop (windows).

However, if the Kerberos security is enabled, the connection fails:

library(sparklyr)
sc <- spark_connect("http://host:8998", method = "livy")

returning

Error in livy_validate_http_response("Failed to create livy session",  : 
Livy operation is unauthorized. Try spark_connect with config = livy_config()

using sparklyr_0.5.6-9002 and MIT Kerberos for Windows for the authentication.

On the other hand, from within the cluster (i.e. through curl) the connection is successful.

What am I doing wrong? What additional settings are required for such connection?

The livy_config(..., username, password) config seems to be forming only a Authorization: Basic ... header, though here I'd suspect a Negotiate or Kerberos(?) should be required instead.

Are there any other possible configurations I'm missing?

NB: same error is returned from RStudio Server (web) after kinit'ing from the shell with authorized user.

runr
  • 1,142
  • 1
  • 9
  • 25
  • livy server should be configured for the launch, auth like the below one. – Kangrok Lee Aug 02 '17 at 06:28
  • In the livy.conf file, you should put `livy.server.launch.kerberos.principal = XXX`, `livy.server.launch.kerberos.keytab = XXX`, `livy.server.auth.kerberos.principal = 'XXX', `livy.server.auth.kerberos.keytab = spnego keytab`. Please refer to http://henning.kropponline.de/2016/11/06/connecting-livy-to-a-secured-kerberized-hdp-cluster/ – Kangrok Lee Aug 02 '17 at 06:35
  • @KangrokLee Thanks for your reply! The livy is properly configured (I probably forgot to mention that in the post, will edit), your suggested settings are set and the livy is working fine from within the cluster (i.e. from RStudio Server or curl). The problem described arises when trying to connect from remote Rstudio desktop on Windows. – runr Aug 02 '17 at 08:47
  • Did you ever get this working? – ansek Jan 01 '18 at 17:08
  • @ansek No, not directly through Kerberos. However, a workaround was to connect through [Knox Gateway](https://knox.apache.org/) with [Livy added as a service](https://community.hortonworks.com/articles/70499/adding-livy-server-as-service-to-apache-knox.html). This way the Kerberos is handled by Knox within the cluster, while from remote Rstudio desktop client only the ``Basic`` authentication is required (which is through SSL, so doesn't seem too bad). Note that in my case it also required some minor tweaking of ``sparklyr``'s source code, but not sure whether it is fixed with latest updates. – runr Jan 02 '18 at 20:03

1 Answers1

1

I'm coming late to the party, but I had the same problem and was finally able to solve it. This could be useful to others.

Of course this may depend a lot on your cluster configuration. I'm using sparklyr 1.5.0, and MIT Kerberos for Windows, with direct connection to Livy (no Knox proxy) running in a Cloudera HDP cluster (Spark 2.3.0). In my case an extra HTTP header was required, see below.

If your cluster doesn't allow outgoing internet connections, you should also first save the SparklyR server-side jar on HDFS (by default it is automatically downloaded from GitHub).

library(sparklyr)
SPARK_VERSION = "2.3.0"

lcfg = livy_config(
  negotiate = TRUE, 
  custom_headers = list("X-Requested-By"="<user_name>"))
lcfg$sparklyr.livy.jar = "hdfs:///path/to/sparklyr-2.3-2.11.jar"

sc = spark_connect(
  master = "http://livyserver:8999", method = "livy", 
  version = SPARK_VERSION,
  config = lcfg)

For debugging, a first step might be to test your Livy setup outside of the cluster but without R: see https://livy.apache.org/examples/

Pierre Gramme
  • 1,209
  • 7
  • 23
  • Looks promising, will try it out as soon as I'll get the chance! As a side note, do you have a direct access to the Kerberos server? I've later noticed that many issues came from using Kerberos through an ssh tunnel (with proper fqdn in hosts file) which sometimes resulted in problems with GSS Negotiate. – runr Dec 01 '20 at 02:44
  • I'm on a VPN, no SSH tunnel, so can't tell anything about that problem – Pierre Gramme Dec 02 '20 at 19:34