How to schedule complicated jobs

Question

At first, I used linux cron to schedule jobs. As jobs and dependencies between them increase, I find it hard to maintain.

For example,

0 4 * * 1-5 run-job-A

10 4 * * 1-5 run-job-B

15 4 * * 1-5 run-job-C

job-B runs after job-A is done, job-C runs after job-A and job-B are both done. I assume job-A can be done in 10 minutes and job-B can be done in 5 minutes. So I let job-B run at 4:10 and job-C run at 4:15.

Job DAG

As you see, I calculate DAG critical path and processing time manually. It's very trivial. And it's easy to make things wrong as these jobs grow.

Is there a better way to schedule these jobs? I am looking for a common, universal tool to handle these jobs.

You could also configure systemd units and set up dependencies there. — Daniel, Nov 24 '16 at 13:23

score 2 · Answer 1 · answered Nov 24 '16 at 13:57

2

If your jobs are linear and don't run randomly, I would suggest to call out these all jobs in a separate shell script and you can keep adding those jobs this shell script irrespective of number of jobs. Cause you'd never know how much time one job would take to execute considering all system aspects like idle system, system at high utilization and system at medium utilization. Let me know your thoughts.

answered Nov 24 '16 at 13:57

Shailesh Sutar

1,517
5
23
41

In fact, that's what I did. Some of my jobs are shell scripts which contains other jobs. I am looking for a common, universal tool to handle these jobs. – gzc Nov 24 '16 at 14:17

score 1 · Answer 2 · answered Nov 24 '16 at 13:21

1

So what would go wrong if you just did

0 4 * * 1-5 run-job-A && run-job-B && run-job-C

then B would only be run after A has been successfully finished and C after B has been successfully finished.

Just wondering. :)

answered Nov 24 '16 at 13:21

Janne Pikkarainen

31,852
4
58
81

This is just a simple example. I have dozens of jobs. – gzc Nov 24 '16 at 13:22

score 1 · Answer 3 · answered Nov 24 '16 at 15:25

Great question, and you are not alone. In the HPC community this is a common problem because jobs can have variable run times and yet there is a strong dependency ordering among jobs. I would look at what those folks are doing for inspiration. For example the OpenLava scheduler is an open source scheduler that explicitly caters for dependency mapping.

How to schedule complicated jobs

3 Answers3