0

I've built a mapper and a reducer in Ruby and it runs successfully as a streaming job. However, I need to do a second map and reduce based on output of the last reduce.

Is there any way I can define multiple Ruby files for mappers and reducers in my Streaming job? Like chaining.

tolgap
  • 9,629
  • 10
  • 51
  • 65

1 Answers1

0

No.

You can chain two streaming jobs, though, and just use the output directory from the first as the input directory for the second.

cohoz
  • 750
  • 4
  • 16
  • So just call a second streaming job in my `job.sh` file right after the first one? Will try doing that. – tolgap Jan 15 '14 at 11:13
  • Correct. You can also do the usual things to ensure that the first job completed successfully rather than always launching the second job (but even without this, typically the second job would fail quickly, since its input directory wouldn't exist). For more complex dependencies, you should probably investigate some of the existing tools to handle map-reduce workflows. – cohoz Jan 16 '14 at 04:07
  • Do you have a recommendation for me? – tolgap Jan 16 '14 at 09:01