2

I am wanting to set up a job queue with multiple workers. Right now I am looking at beanstalkd, but this is more of a conceptual problem, I believe: How can you ensure that jobs related to a single entity get handled in order?

Let's say the workers manage an email platform for some UI. For a given mailbox, jobs need to be performed serially. For example, sometimes a user will want to re-push their password into the mail platform while troubleshooting. So, they change their password, then change it back right away. That's two password-change jobs submitted to beanstalkd.

Now, most of the time this will go fine, as beanstalkd will hand those jobs out to workers in order. However, some transient error like a DNS lookup delay could cause the second password change (back to the proper one) to go through before the first, leaving the mailbox with an incorrect password.

I have thought about introducing semophores/mutexes, and having a 1:1 worker-machine:beanstalkd-server ratio, but even that would only work of the locks requests are granted in the order requested, which doesn't seem fully reliable. Having a queue per entity opens some other options, but this needs to support hundreds of thousands of entities.

Judging by how little discussion around this topic I've found, this must not be as common of a scenario as I initially thought. Does anyone have experience dealing with this problem?

Justin McAleer
  • 236
  • 1
  • 9

1 Answers1

2

A couple of potential methods come to mind.

  1. As you point out, unless you are changing priorities, Beanstalkd is a FIFO queue. This means that, if only one worker is dealing with changing the password, it would handle the jobs in order.
  2. If there are multiple workers, then you could store meta-data alongside the password - a last modified time (more exactly, when the password change request was made). That time would be set from the job, but if the time that is already in the database (alongside the password) is ever newer than the latest request - the new request would be dropped as out of date.

Depending on the user data storage, you may need additional locking around the database (with an SQL database, this is quite easy, but a file-based store would need additional locking to avoid potential file corruption).

Alister Bulman
  • 34,482
  • 9
  • 71
  • 110