How to schedule Hadoop jobs conditionally?

Question

I am pretty new to Hadoop, and particularly to Hadoop Job Scheduling. Here is what I am trying to do.

I have 2 flows, each having a Hadoop job. I have freedom to put these flows either in the same project or in different ones. I don't want the Hadoop jobs to run simultaneously on the cluster, but I also want to make sure that they run alternatively.

E.g. flow_1 (with hadoop_job_1) runs and finishes -> flow_2 (with hadoop_job_2) runs and finishes -> flow_1 (with hadoop_job_1) runs and finishes and so on.

And of course, I would also like to handle special conditions gracefully. E.g. flow_1 done, but flow_2 is not ready, then flow_1 gets chance to run again if it is ready, if flow_1 fails, flow_2 still gets its turn, etc.

I would like to know which schedulers I can explore which are capable of doing this.

We are using MapR.

Thanks

score 0 · Answer 1 · answered Sep 12 '14 at 02:31

0

This looks to be a standard use case of oozie. Take a look at these tutorials Executing an Oozie workflow with Pig, Hive & Sqoop actions and Oozie workflow scheduler for Hadoop

answered Sep 12 '14 at 02:31

Nabeel Moidu

222
1
6

How to schedule Hadoop jobs conditionally?

1 Answers1