2

I can sucessfully kick of a hadoop streaming job from the terminal but i am looking for ways to start steaming jobs via an api, eclipse or some other means.

The closest i found was this post https://stackoverflow.com/questions/11564463/remotely-execute-hadoop-streaming-job but it has no answers!

Any ideas or suggestions would be welcome.

Community
  • 1
  • 1
Mark Vickery
  • 1,927
  • 3
  • 22
  • 34

3 Answers3

2

Interesting question, I found a way to do this, hopefully this will help you too.

First method should work on Hadoop 0.22:

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://xxxxx:9000");
conf.set("mapred.job.tracker", "hdfs://xxxxx:9001");
StreamJob sj = new StreamJob();
try {
    ToolRunner.run(conf, sj, new String[] { 
                "-D", "stream.tmpdir=c:\\",
                "-mapper", "/path/to/mapper.py",
                "-reducer", "/path/to/reducer.py", "-input",
                "/path/to/input", "-output",
                "/path/to/output" });
} catch (Exception e) {
    e.printStackTrace();
}

I also found this Java wrapper which you should be able to run.

Charles Menguy
  • 40,830
  • 17
  • 95
  • 117
  • I am trying to do as you have said here, but I get back exit code 5. Any idea how to interpret that? – Mahdi Jun 18 '15 at 16:03
  • Never mind. My problem was adding the right dependencies and then include map-red.xml and yarn-site.xml into my YarnConfiguration. – Mahdi Jun 22 '15 at 14:38
1

Take a look at Apache Oozie - once you have defined your job via XML you can launch a job via an Http POST to the oozie server

Chris White
  • 29,949
  • 4
  • 71
  • 93
0

When the Hadoop streaming job is run as

hadoop jar /home/training/Installations/hadoop-1.0.3/contrib/streaming/hadoop-streaming-1.0.3.jar -input input4 -output output4 -mapper /home/training/Code/Streaming/max_temperature_map.rb -reducer /home/training/Code/Streaming/max_temperature_reduce.rb

then org.apache.hadoop.streaming.HadoopStreaming is executed. This class is defined in the MANIFEST.MF in the hadoop-streaming-1.0.3.jar. Check the code in the org.apache.hadoop.streaming.HadoopStreaming java class to know the API details.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117