4

I have a web application which interacts with Hadoop. (Cloudera cdh3u6) A particular user operation should launch a new Map Reduce job in the cluster.

The cluster is not a secure cluster, but it uses simple group authentication - so if I ssh to it as myself, I can launch MR jobs from the command line.

In the web application, I'm using the ToolRunner to run my job:

MyMapReduceWrapperClass mr = new MyMapReduceWrapperClass();
ToolRunner.run(mr, null);


// inside the run implementation of my wrapper class : 
Job job = new Job(conf, "job title");
//set up stuff removed
job.submit();

Currently this job is submitted as the user that launched the web application server (Tomcat) process, and that user is a special local account on this web server that doesn't have permissions to send jobs to the cluster.

Ideally I'd like to be able to get some kind of identity from the user and pass it along, so that as different users were interacting with the web app / service we could see who was invoking what jobs. Skipping over the issues of how to actually coordinate those credential services, I'm not even clear on where it would go.

I see that on a Job I have a getCredentials() option, but from reading about the token / Kerberos stuff in there I have the impression that this is for secured clusters (which I think we are not) - not to mention I don't think my webserver has Kerberos installed. That could be fixed though. But it also sounds like the intended use case is to add secrets that a map reduce job might want while running to access other services - and not about running the job as someone else.

I also see that on the (older?) JobConf class I have the ability to setUser(String name) which seems promising - even though I don't know where it would require a password or something - but I can't find much information or documentation on that function. I tried it out and it had no impact - the job was still submitted as the Tomcat user.

Are there other avenues to explore or research? I am out of key words to Google. I would prefer to not have the option "Just give your tomcat user permissions on the cluster" - I don't manage that asset and I don't expect that request to fly. If however that literally is my only option I'd like to understand why that is, so that I can argue the need, having the right information.

Mikeb
  • 6,271
  • 3
  • 27
  • 41
  • 1
    I believe you're looking for something similar to what is referred to as secure impersonation: http://hadoop.apache.org/docs/stable/Secure_Impersonation.html - it's a starting point – Chris White Apr 19 '13 at 16:11

1 Answers1

4

You can use the UserGroupInformation class like this:

UserGroupInformation ugi = UserGroupInformation.createRemoteUser(username);
ugi.doAs(new PrivilegedExceptionAction<MyMapReduceWrapperClass>() {
    public Object run() throws Exception {
        MyMapReduceWrapperClass mr = new MyMapReduceWrapperClass();
        ToolRunner.run(mr, null);
        return mr;
    }
});
highlycaffeinated
  • 19,729
  • 9
  • 60
  • 91
  • This did indeed work. I don't understand why I don't need to (apparently) supply any credentials for the username I chose. I had also tried using `createRemoteProxy` but that generated errors relating to not being able to impersonate my hadoop user from my tomcat user. Still, progress is progress, thanks! – Mikeb Apr 23 '13 at 14:32
  • @Mikeb ,@highlycaffeinated does this requires any additional configuration changes? Its not working in my case. – Akash Mahajan Mar 02 '16 at 05:38
  • @AkashMahajan how is it not working? how is your cluster secured? you should open a new question with the specifics of your issue so we can help – highlycaffeinated Mar 02 '16 at 15:06
  • No it's not secured. The core-site.XML just has fs.default.fs and hadoop.temp.dir property. Also nothing additional in yarn-site.XML and hdfs-site.XML. that's why I am asking what other config are required. – Akash Mahajan Mar 02 '16 at 19:26