0

I am pretty new to Hadoop, and particularly to Hadoop Job Scheduling. Here is what I am trying to do.

I have 2 flows, each having a Hadoop job. I have freedom to put these flows either in the same project or in different ones. I don't want the Hadoop jobs to run simultaneously on the cluster, but I also want to make sure that they run alternatively.

E.g. flow_1 (with hadoop_job_1) runs and finishes -> flow_2 (with hadoop_job_2) runs and finishes -> flow_1 (with hadoop_job_1) runs and finishes and so on.

And of course, I would also like to handle special conditions gracefully. E.g. flow_1 done, but flow_2 is not ready, then flow_1 gets chance to run again if it is ready, if flow_1 fails, flow_2 still gets its turn, etc.

I would like to know which schedulers I can explore which are capable of doing this.

We are using MapR.

Thanks

Bhushan
  • 18,329
  • 31
  • 104
  • 137

1 Answers1

0

This looks to be a standard use case of oozie. Take a look at these tutorials Executing an Oozie workflow with Pig, Hive & Sqoop actions and Oozie workflow scheduler for Hadoop

Nabeel Moidu
  • 222
  • 1
  • 6