I am pretty new to Hadoop, and particularly to Hadoop Job Scheduling. Here is what I am trying to do.
I have 2 flows, each having a Hadoop job. I have freedom to put these flows either in the same project or in different ones. I don't want the Hadoop jobs to run simultaneously on the cluster, but I also want to make sure that they run alternatively.
E.g. flow_1 (with hadoop_job_1) runs and finishes -> flow_2 (with hadoop_job_2) runs and finishes -> flow_1 (with hadoop_job_1) runs and finishes and so on.
And of course, I would also like to handle special conditions gracefully. E.g. flow_1 done, but flow_2 is not ready, then flow_1 gets chance to run again if it is ready, if flow_1 fails, flow_2 still gets its turn, etc.
I would like to know which schedulers I can explore which are capable of doing this.
We are using MapR.
Thanks