3

At my work, we use cron heavily to automate many system administration tasks, from backups to report generation. The problem is that the complexity of our system of 50+ cron jobs is starting to collapse under its own weight. Let me describe our set up a bit:

  • ~15 developers, some who are responsible for cron jobs running via their personal crontab
  • 30+ machines, some of which are running cron jobs, sometimes by several people
  • Many cron jobs are not being logged, and all of their stdout and stderr are being piped to /dev/null (to my chagrin)
  • Some cron jobs are too noisy, spitting out superfluous volumes of text that make the emails from cron a pain to sift through
  • Most cron jobs, if they are monitored at all, go to a group email alias, and thus many people see messages that are not relevant to them and become conditioned to ignoring them
  • Often cron jobs fail, and we don't notice in time
  • Some cron jobs are being tracked by our backup system, others not. No source control.
  • When one of our servers goes down, it means that any cron jobs stored in user's crontab files on that machine do not run, and we don't realize that the cron jobs failed to run

Ideally we want a set-up or software system where:

  • Any developer can go in and tweak/fix a cron-job easily, and not limit it to personal crontabs
  • Have flexibility about which machine the cron job runs on, even if the crontab is somehow centralized on a particular machine
  • All successful cron job runs are logged succinctly, so we know that something happened
  • All errors are trapped and reported to a fine-grained list of relevant developers based on the error message and the cron job
  • Users can be set up to monitor certain cron jobs, whether they succeed or fail
  • Users can receive a summary (email or webpage) that details which jobs have failed and which jobs have succeeded in a particular window of time
  • Logging of cron job stats (run time, exit status, output volume) using something like RRDtool for analysis
    • Robustness: One server going down doesn't clobber the entire cron job system

Searching online, I see some discussion of "cron job best practices", but it only seems to address some of our requirements. In terms of software support for some of these features, it seems that there are tools like cronic, shush, and cronwrap (sorry, I'm a new user and limited to 2 hyperlinks). I'm sure there are more that I'm missing.

It seems that I could code up something like this, but it seems that something like this surely must have been created already. Any advice on existing systems/methodologies, or pointers on how to construct such a system, would be greatly appreciated.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
taltman
  • 198
  • 2
  • 6
  • possibly related: http://stackoverflow.com/questions/1914884/distributed-job-scheduling-management-and-reporting – colllin Dec 04 '12 at 22:59
  • Had u ever tried using a Continuous Integration server like Hudson / Jenkins ? – Harshavardhan Konakanchi Dec 05 '12 at 04:46
  • @collindo: I've worked with batch submission systems before like Condor and qsub. They have some nice monitoring features, but they otherwise do not provide cron-like features, nor logging or analytics. – taltman Dec 12 '12 at 08:54
  • @Harsha: I haven't used a Continuous Integration server before. My understanding is that it applies build-code and a test suite upon every code commit. Could you describe more why you think that this would be a good solution to the requirements that I posted? – taltman Dec 12 '12 at 08:55

1 Answers1

0

I'm not an expert on this topic but I hope this can helps you, I recently heard of this new technologies:

Job scheduler, Work load automation solutions, and this list of job scheduler software.

Practically I don't know anything about this, but I suppose that this Job schedulers and Work load automation software are enterprise level job schedulers that are used in SOA or Enterprise Integration Architectures and usually can be integrated with ERP systems.

Honestly, I must confess that I'm not sure if this technology is the right tool for your needs, you must do a deep research on this topics, I hope that this response expand your "solutions panorama".

Miguel A. Carrasco
  • 1,379
  • 1
  • 15
  • 26
  • Thanks for the reference. I think that "workload automation" is in the right direction, though all of the examples seem to be these heavy-weight Enterprise-class corporate solutions. Ideally, I'm looking for something for managing cron complexity that is Unixy, script-friendly, and open-source. – taltman Dec 12 '12 at 09:02