4

I've got multiple servers sharing a database - on each of them a cron job fires ever 5 min checking if a text message log entry doesn't exist, creates a text message log entry and sends out a text message. I thought that there would never be a situation where text messages are sent multiple times, as one server should be first.

Well - I was wrong and that scenario did happen:

  • A - check if log exists - it doesn't
  • B - check if log exists - it doesn't
  • A - create log
  • B - create log
  • A - send message
  • B - send message

I've changed this behaviour to introduce queue, which should mitigate the issue. While the crons will still fire, multiple jobs will be queued, and workers should pick up given jobs at different times, thus preventing of sending of message twice. Though it might as well end up being:

  • A - pick up job 1
  • B - pick up job 2
  • A - check if log exists - it doesn't
  • B - check if log exists - it doesn't

Etc or A and B might as well pickup the same job at exactly the same time.

The solution would be, I guess, to run one worker server. But then I've the situation that jobs from multiple servers are queued many times, and I can't check if they're already enqueued as we end up with first scenario.

I'm at loss on how to proceed here - while multiple server, one worker server setup will work, I don't want to end up with instances of the same job (coming from different servers) multiple times in the queue.

Maybe the solution to go for is to have one cron/queue/worker server, but I don't have experience with Laravel/multiserver environment to set it up.

The other problematic thing for me is - how to test this? I can't, I guess, test it locally unless there's a way I can spin VM instances that are synchronized with each other.

eithed
  • 3,933
  • 6
  • 40
  • 60

2 Answers2

9

The easy answer:

The code that checks the database for the existing database entry could use a database transaction with a level high enough to make sure that everyone else that is trying to do the same thing at the same time will be blocked and wait for the job to finish/commit.

A really naive solution (assuming mysql) would be LOCK TABLES entries WRITE; followed by the logic, then UNLOCK TABLES when you're done.

This also means that no one can access the table while your job is doing the check. I hope the check is really quick, because you'll block all access to the table for a small time period every five minutes.

WRITE lock:

  • The session that holds the lock can read and write the table.
  • Only the session that holds the lock can access the table. No other session can access it until the lock is released.
  • Lock requests for the table by other sessions block while the WRITE lock is held.

Source: https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html

That was a really boring answer, so I'll move on to the answer you're probably more interested in...

The server architecture answer:

Your wish to only have one job per time interval in your queue means that you should only have one machine dispatching the jobs. This is easiest done with one dedicated machine that only dispatches jobs from scheduled commands. (Laravel 5.5 introduced the ability to dispatch jobs directly from the scheduler; see Scheduling Queued Jobs)

You can then have an several worker machines processing the queue, and only one of them will pick up the job and execute it. Two worker machines will never execute the same job at the same time if everything works as usual*.

I would split up the web machines from the worker machines so that they can scale independently. I prefer having my web machines dedicated to web traffic, they are not processing jobs to make sure that any large amount of queued jobs will not affect my http response times.

So, I recommend the following machine types in your setup;

  1. The scheduler - one single machine that runs the schedule and dispatches jobs.
  2. Worker machines that handles your queue.
  3. Web machines that handles visitors' traffic.

All machines will have identical source code for your Laravel application. They will also also have an identical configuration. The only think that is unique per machine type is ...

  1. The scheduler has php artisan schedule:run in the crontab.
  2. The workers have supervisor (or something similar) that runs php artisan queue:work.
  3. The web servers have nginx + php-fpm and handles incoming web requests.

This setup will make sure that you will only get one job per 5 minute since there is only one machine that is pushing it. This setup will also make sure that the cpu load generated by the workers aren't affecting the web requests.

One issue with my answer is obvious; that single scheduler machine is a single point of failure. If it dies you will no longer have any of these scheduled jobs dispatched to the queue. That touches areas like server monitoring and health checks, which is out-of-scope of your question and are also highly dependant on your hosting provider.


Regarding that little asterisk; I can make up weird scenarios where a job is executed on several machines. This involves jobs that sleeps for longer than the timeout, while at the same time you've got an environment without support for terminating the job. This will cause the first worker to keep executing the job (since it cannot terminate it), and a second worker will consider the job as timed-out and retry it.

sisve
  • 19,501
  • 3
  • 53
  • 95
  • Given current setup is to have multiple workers, which "should" work (though my initial setup without queues "should" work as well given that I've never seen a setup where two machines would be in absolute synchronicity with each other) - it's good to have a confirmation of my thoughts though. I'm running 5.4 and it's currently quite unlikely to upgrade. The solution I'm more familiar with (albeit not Laravel) is to have an index in memcache of all servers, with who is currently the master. If master dies, a new master is appointed and queue just checks if it's called on master. – eithed Nov 02 '17 at 22:21
4

Since Laravel 5.6+ you can ensure your scheduled tasks only run on a single instance using the onOneServer function e.g.

$schedule->command('loggingTask')
            ->everyFiveMinutes()
            ->onOneServer();

This requires an APC or Redis cache to be set up because it seems to use a mutual exclusion lock, probably RedisLock if Redis is set up.

Using a queue you shouldn't really have such a problem because popping a task off a queue should be an atomic operation.

Source

apokryfos
  • 38,771
  • 9
  • 70
  • 114
  • Yup - given the queue the operations should be atomic, unless they're happening exactly at the same tick on each of the workers (which should reaaaaaaally be unlikely). In my scenario the operation was connected to sending text messages, and I was surprised to receive multiple messages with the increase of workers. It's excellent to see this `onOneServer` - I'd be keen to explore this if I were to still work on my original project. – eithed Oct 19 '18 at 09:47