0

I have a 3 beanstalkd queue processes running on the same ip but different ports. I have a separate server running supervisor spawning php workers in parallel (20 per beanstalkd port) to process the queue. My problem is that it seems like two processes can reserve the same job id on the same server in the same time.

Here is some sample output from my log:

2017-02-23 09:59:56 --> START JOB (port: 11301 | u: 0.45138600 1487861996 | jid:1695074 | pid:30019 | j:leads_to_tags_add | tr:1)
2017-02-23 09:59:57 --> START JOB (port: 11301 | u: 0.55024800 1487861997 | jid:1695074 | pid:30157 | j:leads_to_tags_add | tr:2)
2017-02-23 09:59:58 --> DEL   JOB (port: 11301 | u: 0.54731000 1487861998 | jid:1695074 | pid:30019 | j:leads_to_tags_add)
2017-02-23 09:59:58 --> DEL   JOB (port: 11301 | u: 0.58927900 1487861998 | jid:1695074 | pid:30157 | j:leads_to_tags_add)

It seems the two reserves happen right after each other, and the second reserve happens before the first process finishes and deletes the job.

I added a counter in redis per jobid and it's clear that by the time it reserves the second time, the counter went up once (tr). The TTRR is set to 3600, so there is no way that it expires before the first process completes.

This is what the job status looks like right after the second process reserve:

Pheanstalk\Response\ArrayResponse::__set_state(array(
   'id' => '1695074',
   'tube' => 'action-medium',
   'state' => 'reserved',
   'pri' => '0',
   'age' => '1',
   'delay' => '0',
   'ttr' => '3600',
   'time-left' => '3599',
   'file' => '385',
   'reserves' => '2',
   'timeouts' => '0',
   'releases' => '0',
   'buries' => '0',
   'kicks' => '0',
))

This behaviour is very random, sometimes only one process will be able to reserve until the job locks, sometimes 2, sometimes even 4 or more (rarely). Of course this creates an inconsistent number of duplicate jobs being executed.

Short version of the code:

$this->job = $this->pheanstalk->watch($tube)->reserve($timeout);


set_error_handler(function($errno, $errstr, $errfile, $errline, array $errcontext) {
    // error was suppressed with the @-operator
    if (0 === error_reporting()) {
        return false;
    }

    throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
});

$this->log_message('info', __METHOD__ . ": START JOB (" . $this->_logDetails() . " | tr:{$tries})");


if ($this->_process_job()) {
    $this->log_message('info', __METHOD__ . ": FINISHED JOB (" . $this->_logDetails() . ")");
    $this->_delete_job();
} else {
    $this->log_message('error', __METHOD__ . ": FAILED JOB (" . $this->_logDetails() . ")");
}

restore_error_handler();

and

protected function _delete_job()
{
    $this->pheanstalk->delete($this->job);
    $this->log_message('info', __METHOD__ . ": DELETED JOB (" . $this->_logDetails() . ")");
}
pierdevara
  • 406
  • 2
  • 9
  • Please post your code where you `reserve` the job, send it to be worked on,, and then `delete` it when complete. – Alister Bulman Feb 23 '17 at 19:04
  • @AlisterBulman updated with some code – pierdevara Feb 23 '17 at 22:13
  • Does the issues still occur with only one Beanstalkd server? Do all the servers have different configs for the binary logging etc, and why use more than one anyway? – Alister Bulman Feb 23 '17 at 22:32
  • Currently it's just a test for horizontal scaling of beanstalk. In case we ever need to add extra servers in the mix. In a real world scenario the beanstalk processes would be on different servers, not just different ports. The different processes are pointing to different bin logs. I will test leaving just one beanstalk process on. – pierdevara Feb 24 '17 at 05:38
  • @AlisterBulman Turned off the other two, and the issue still occurs with a single beanstalk process. – pierdevara Feb 24 '17 at 10:56
  • I never heard of this issue, could you let me know what kernel and linux distro you use. It might be from how it was compiled. Try reinstalling from another repo. – Pentium10 Feb 24 '17 at 12:53

1 Answers1

0

The issue was a TCP disconnection happening within the process_job bit of my code. The pheanstalk object would get overwritten somewhere down the execution chain.

I have a library wrapping pheanstalk which checks for the emptiest beanstalkd server available before sending a put command. It would then create a new Pheanstalk instance with the best server's details, save it as the current connection, and then dispatch the command.

Sometimes the worker can send a subjob to the queue. The library would be loaded in the worker initially to fetch a job, and again in process_job to send to the queue. Due to the dependency injection in Codeigniter, that library would refer to the same object, and inside process_job it would overwrite the worker's current connection and cause a TCP disconnect.

pierdevara
  • 406
  • 2
  • 9