2

I've setuped load balancers: lb1 (active) and lb2 (passive), Glustered web servers: web1 (active), web2 (backup), and some clustered database. Both web servers and databases are A pointed to the VIP of the load balancers.

Both of web servers have their copy of cron jobs. Assuming the following tasks:

* * * * * echo $(hostname) >> crontab.txt
0 0 1 * * ~/bin/another/task 2>&1

With some random lock algorithm:

lock_dir=~/.cronlock
pid_file=~/.cronlock/pid

if ( mkdir ${lock_dir} ) 2> /dev/null; then
    echo $$ > $pid_file
    trap 'rm -rf "$lock_dir"; exit $?' INT TERM EXIT

    # Crons

    rm -rf "$lock_dir"
    trap - INT TERM EXIT
fi

Is safe to have something like

* * * * * ./lock_algorithm -f LOCK_FILE1 -c "echo $(hostname) >> crontab.txt"
0 0 1 * * ./lock_algorithm -f LOCK_FILE2 -c "~/bin/another/task 2>&1"

Where I send a "per-cron-command" unique lock file name and a command to be executed?

By "safe" I mean web1 OR web2 will run, not both.

And if I need cron overlap (eg: each minute I perform a long task limited to the current minute)? How to get web1's cron executing again, assuming that web1 is the active "cron runner"?

  • 1
    Also see [Running crons on a scaled mirrored servers (cron logic needed)](http://unix.stackexchange.com/a/242693/100397) – roaima Nov 14 '15 at 23:43
  • Across two (or more) hosts a PID isn't unique. Will `lock_algorithm` know that the PID refers to a process on the other host? How will it know when the lock is stale? – roaima Nov 14 '15 at 23:46
  • @roaima I alread have seen this question. The main issue is the need of manual intervention in case of main web server outage. – Gabriel Santos Nov 14 '15 at 23:47
  • What defines "main" web server in this context? More usefully here, can you use that criterion to determine on which host the `cron` jobs can run? – roaima Nov 14 '15 at 23:48
  • @roaima I don't think so. Webservers are picked by load balancer in a active/passive aproach, so, the active server does not know that it is the active one. HAProxy can send additional headers to served web pages, but not to system/cron. – Gabriel Santos Nov 14 '15 at 23:52
  • I could ping each server and verify which one is down, but don't think that this is a good solution. Manual intervention will be required time to time, plus the additional delay in script execution. – Gabriel Santos Nov 14 '15 at 23:54
  • I'm familiar with load balancers. You mentioned "main" server as in active/passive rather than active/active you also seemed to imply a manual switchover. What do you use for your load balancer? – roaima Nov 15 '15 at 09:07
  • Why aren't you using posix-locks in gluster and `flock` to do this? – Matthew Ife Nov 15 '15 at 10:37

2 Answers2

1

Not sure if the following is possible for you, but here is an idea of mine:

  • Not sure which cluster stack / software you're using, but you could introduce pacemaker and corosync on web1 and web2, and use ocf ressource agents for this. To give you an idea, what this is about:

    primitive p_postfix ocf:heartbeat:postfix \
      params config_dir="/etc/postfix" \
      op monitor interval="10"
    primitive p_symlink ocf:heartbeat:symlink \
      params target="/srv/postfix/cron" \
        link="/etc/cron.d/postfix" \
        backup_suffix=".disabled" \
      op monitor interval="10"
    primitive p_cron lsb:cron \
      op monitor interval=10
    order o_symlink_before_cron inf: p_symlink p_cron
    colocation c_cron_on_symlink inf: p_cron p_symlink
    colocation c_symlink_on_postfix inf: p_symlink p_postfix
    
  • What this will do is the following:

    • Check whether a file named postfix already exists in /etc/cron.d.
    • If it does, rename it to postfix.disabled (remember, cron ignores job definitions with dots in the filename).
    • (Re-)Create the postfix job definition as a symlink to /srv/postfix/cron.
    • Restart cron when it's done.
  • This example is out of a active / passive cluster running postfix. Cron get's only executed on the active postfix node.

  • You could alter this to remove postfix and instead include your webserver.


Edit: If the above is "too much" for you, here is another idea: You could set up HAProxy stats, get this website in your script, parse it and act accordingly depending on the hostname and the status emitted by HAProxy.

gxx
  • 5,591
  • 2
  • 22
  • 42
1

It seems like you're trying to create a semaphore that works across servers. While someone has tried to build that I don't see it being production-ready. Rather than pushing the technological envelope it might be good to refactor your problem into something that fits available technology.

Does your website have a database? You could use that for coordination.

If not, how about using a distributed queueing system like kafka or 0mq?

chicks
  • 3,793
  • 10
  • 27
  • 36