2

I have a cluster secured by Kerberos, and have a REST API that needs to interact with the cluster on behalf of the user. I have used Spring Security with SPNEGO to authenticate the user, but when I try to use the Hadoop SDK, it fails for various reasons based on what I try.

When I try to use the SDK directly after the user logs in, it gives me SIMPLE authentication is not enabled.

I have noticed the session's Authenticator is UserNamePasswordAuthenticationToken which does not make sense, since I'm authenticating against the Kerberos realm with the credentials from the user.

I am trying to use this project out of the box with my own service account and keytab: https://github.com/spring-projects/spring-security-kerberos/tree/master/spring-security-kerberos-samples/sec-server-spnego-form-auth

Benny
  • 3,899
  • 8
  • 46
  • 81
  • When using the "Hadoop SDK" I guess you create a Hadoop `Configuration` and a Hadoop `UserGroupInformation`. How do you feed the configuration properties to the `Configuration`, implicitly *(i.e. drop `core-site.xml` etc. in a local directory and add the directory to CLASSPATH)* or explicitly? If implicitly, did you check that the file(s) are actually read *(otherwise Hadoop reverts silently to hard-coded defaults e.g. authentication SIMPLE, and your program will crash and burn with meaningless Exception messages later)*? – Samson Scharfrichter Mar 07 '16 at 17:20
  • If it's not already the case, you can raise some debugging flags as explained in https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/secrets.html i.e. `-Dsun.security.krb5.debug=true` and `export HADOOP_JAAS_DEBUG=true` – Samson Scharfrichter Mar 07 '16 at 17:30
  • I am adding config files explicitly, but I am trying to avoid using UGI at all since it's all using static members and I need to support thread-safe operations. I probably have a foundational misunderstanding of what needs to be present for everything to work under the user's principal, but I was hoping it would be as simple as obtaining a reference to a `Subject` and running my HDFS listing, etc inside a `Subject.doAs` – Benny Mar 07 '16 at 17:52
  • Ahem - you should read carefully the GitBook in the link above; it's written by the guy who wants to rewrite the whole UGI stuff *(and stuff in ZK that smells even worse)*, out of bad experience maintaining it... – Samson Scharfrichter Mar 07 '16 at 21:07

2 Answers2

0

First of all, Spring Sec Kerberos Extension is a terrible piece of code. I have evaluated it once and abstained from using it. You need the credential of the client authenticating to your cluster. You have basically two options here:

  1. If you are on Tomcat, you can try the JEE pre-auth wrapper from Spring Security along with my Tomcat SPNEGO AD Authenticator from trunk. If will receive the delegated credential from the client which will enable you to perform your task, assuming that your server account is trusted for delegation.
  2. If the above is not an option, resort to S4U2Proxy/S4U2Self with Java 8 and obtain a Kerberos ticket on behalf of the user principal and perform then your REST API call.

As soon as you have the GSSCredential the flow is the same.

Disclaimer: I have no idea about Hadoop but the GSS-API process is always the same.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Michael-O
  • 18,123
  • 6
  • 55
  • 121
  • *"I have no idea about Hadoop but the GSS-API process is always the same"* -- you are an optimist... the GitBook about "Kerberos and Hadoop" is subtitled **"Madness beyond the Gate"** with excerpts from H.P. Lovecraft all over the place *(and the matching ApacheCon presentations also have some related artwork)* – Samson Scharfrichter Mar 10 '16 at 13:31
  • @SamsonScharfrichter Maybe the stuff has not been done right. It's open source, improve it since you know more about Hadoop than I do. If I would do use Hadoop I'd do it. – Michael-O Mar 10 '16 at 13:34
  • The HortonWorks staff is working on it... and they are way more qualified than anyone else (and paid for that, too). In the meantime we can just wail and weep. – Samson Scharfrichter Mar 10 '16 at 13:49
  • @SamsonScharfrichter If they would, there wouldn't be such books with titles. – Michael-O Mar 10 '16 at 14:26
  • **They** wrote that book. And **they** started [HADOOP-12649] and [ZOOKEEPER-2344] and [HADOOP-12897] etc. -- also fixed several bugs in SPARK for the last release. – Samson Scharfrichter Mar 10 '16 at 19:11
  • *"I have evaluated it once and abstained from using it"* - could you expand on **why**? What makes your implementation better? *"Terrible"* isn't a particularly helpful review. – jonrsharpe Oct 25 '16 at 14:28
0

For what it's worth, you can leverage Apache Knox (http://knox.apache.org) to consume the Hadoop REST APIs in a secured cluster. Knox will take care of the SPNEGO negotiation with the various components for you. You could use the HTTP header based pre-auth SSO provider to propagate the identity of your enduser to Knox.

Details: http://knox.apache.org/books/knox-0-8-0/user-guide.html#Preauthenticated+SSO+Provider

You will need to ensure that only trusted clients can call your service if you are using that provider however.

Alternatively, you can authenticate to Knox against LDAP with username/password with the default Shiro provider.

One of the great benefits of using Knox this way is that your service never needs to know anything about whether the cluster is kerberized. Knox abstracts that from you.

lmccay
  • 396
  • 1
  • 9