Clustered cron with one server only overlap allowed

Question

I've setuped load balancers: lb1 (active) and lb2 (passive), Glustered web servers: web1 (active), web2 (backup), and some clustered database. Both web servers and databases are A pointed to the VIP of the load balancers.

Both of web servers have their copy of cron jobs. Assuming the following tasks:

* * * * * echo $(hostname) >> crontab.txt
0 0 1 * * ~/bin/another/task 2>&1

With some random lock algorithm:

lock_dir=~/.cronlock
pid_file=~/.cronlock/pid

if ( mkdir ${lock_dir} ) 2> /dev/null; then
    echo $$ > $pid_file
    trap 'rm -rf "$lock_dir"; exit $?' INT TERM EXIT

    # Crons

    rm -rf "$lock_dir"
    trap - INT TERM EXIT
fi

Is safe to have something like

* * * * * ./lock_algorithm -f LOCK_FILE1 -c "echo $(hostname) >> crontab.txt"
0 0 1 * * ./lock_algorithm -f LOCK_FILE2 -c "~/bin/another/task 2>&1"

Where I send a "per-cron-command" unique lock file name and a command to be executed?

By "safe" I mean web1 OR web2 will run, not both.

And if I need cron overlap (eg: each minute I perform a long task limited to the current minute)? How to get web1's cron executing again, assuming that web1 is the active "cron runner"?

Also see [Running crons on a scaled mirrored servers (cron logic needed)](http://unix.stackexchange.com/a/242693/100397) — roaima, Nov 14 '15 at 23:43
Across two (or more) hosts a PID isn't unique. Will `lock_algorithm` know that the PID refers to a process on the other host? How will it know when the lock is stale? — roaima, Nov 14 '15 at 23:46
@roaima I alread have seen this question. The main issue is the need of manual intervention in case of main web server outage. — Gabriel Santos, Nov 14 '15 at 23:47
What defines "main" web server in this context? More usefully here, can you use that criterion to determine on which host the `cron` jobs can run? — roaima, Nov 14 '15 at 23:48
@roaima I don't think so. Webservers are picked by load balancer in a active/passive aproach, so, the active server does not know that it is the active one. HAProxy can send additional headers to served web pages, but not to system/cron. — Gabriel Santos, Nov 14 '15 at 23:52
I could ping each server and verify which one is down, but don't think that this is a good solution. Manual intervention will be required time to time, plus the additional delay in script execution. — Gabriel Santos, Nov 14 '15 at 23:54
I'm familiar with load balancers. You mentioned "main" server as in active/passive rather than active/active you also seemed to imply a manual switchover. What do you use for your load balancer? — roaima, Nov 15 '15 at 09:07
Why aren't you using posix-locks in gluster and `flock` to do this? — Matthew Ife, Nov 15 '15 at 10:37

gxx · Answer 1 · 2015-11-15T09:21:22.400

Not sure if the following is possible for you, but here is an idea of mine:

Not sure which cluster stack / software you're using, but you could introduce pacemaker and corosync on web1 and web2, and use ocf ressource agents for this. To give you an idea, what this is about:

primitive p_postfix ocf:heartbeat:postfix \
  params config_dir="/etc/postfix" \
  op monitor interval="10"
primitive p_symlink ocf:heartbeat:symlink \
  params target="/srv/postfix/cron" \
    link="/etc/cron.d/postfix" \
    backup_suffix=".disabled" \
  op monitor interval="10"
primitive p_cron lsb:cron \
  op monitor interval=10
order o_symlink_before_cron inf: p_symlink p_cron
colocation c_cron_on_symlink inf: p_cron p_symlink
colocation c_symlink_on_postfix inf: p_symlink p_postfix

What this will do is the following:
- Check whether a file named postfix already exists in /etc/cron.d.
- If it does, rename it to postfix.disabled (remember, cron ignores job definitions with dots in the filename).
- (Re-)Create the postfix job definition as a symlink to /srv/postfix/cron.
- Restart cron when it's done.
This example is out of a active / passive cluster running postfix. Cron get's only executed on the active postfix node.
You could alter this to remove postfix and instead include your webserver.

Edit: If the above is "too much" for you, here is another idea: You could set up HAProxy stats, get this website in your script, parse it and act accordingly depending on the hostname and the status emitted by HAProxy.

score 1 · Answer 2 · answered Nov 15 '15 at 16:24

It seems like you're trying to create a semaphore that works across servers. While someone has tried to build that I don't see it being production-ready. Rather than pushing the technological envelope it might be good to refactor your problem into something that fits available technology.

Does your website have a database? You could use that for coordination.

If not, how about using a distributed queueing system like kafka or 0mq?

Clustered cron with one server only overlap allowed

2 Answers2