2

I'm using the collection requests as a queue that multiple clients read from concurrently. Each client will read one document at a time from requests and then remove it from the collection. Can I ensure that each document is read and processed by only 1 client?

The clients are written in Python with pymongo.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
rgrasell
  • 99
  • 2
  • 11
  • Optimistic locking is currently the only real way – Sammaye Jan 23 '15 at 09:01
  • @Sammaye That may be a DBA centric approach, but fortunately MongoDB was build with the programmer in mind and can accommodate this. – Neil Lunn Jan 23 '15 at 10:35
  • @NeilLunn I have a feeling he has over simplified his problem especially since this is reading from a queue – Sammaye Jan 23 '15 at 10:44
  • @Sammaye I cannot see a single situation where a "remove" clause on `.findAndModify()` does not suit the situation. Even if the requirement was to "update" and not "remove" it is not possible for another process to return the same document if the criteria to modify was part of the conditions via `$ne` for example. – Neil Lunn Jan 23 '15 at 10:48
  • @NeilLunn application failover – Sammaye Jan 23 '15 at 10:48
  • @Sammaye Not really adding anything of value here. Perhaps your long answer where you qualify that is required. – Neil Lunn Jan 23 '15 at 10:49

2 Answers2

3

The basic procedure here is to use .findAndModify():

Forgive that this is not python code, but the structure is the same and it's a reasonable universal example. Three documents:

{ "_id": 1 }
{ "_id": 2 }
{ "_id": 3 }

So from the core method, you just call it with the "remove" argument on each _id. No other process can do this at the same time.

db.collection.findAndModify({
    "query": { "_id": 1 },
    "remove": true
})

That will either return the document that was removed or nothing at all.


For a bit more "concurrency" proof, again excuse the node.js code here but I'm not in a frame to do some briliant "Twisted" type code as a quick example. It serves the purpose of a concurrency test though:

var async = require('async'),
    mongoose = require('mongoose'),
    Schema = mongoose.Schema;

var testSchema = new Schema({
  "_id": Number,
});

var Test = mongoose.model( 'Test', testSchema, 'test' );

mongoose.connect('mongodb://localhost/async');

async.series(
  [
    // Clear test collection
    function(callback) {
      Test.remove({},callback)
    },

    // Insert some data
    function(callback) {
      async.each([1,2,3],function(num,callback) {
        Test.create({ "_id": num },callback);
      },callback);
    },

    // Now run test in parallel
    function(callback) {
      async.each([1,1,2,2,3,3],function(num,callback) {
        Test.findOneAndRemove(
          { "_id": num },
          function(err,doc) {
            if (err) callback(err);
            console.log( "Removing: %s, %s", num, doc );
            callback();
          }
        );
      },callback);
    }
  ],
  function(err) {
    process.exit();
  }
);

And results (in possible varying order ) :

Removing: 3, { _id: 3, __v: 0 }
Removing: 1, { _id: 1, __v: 0 }
Removing: 3, null
Removing: 1, null
Removing: 2, { _id: 2, __v: 0 }
Removing: 2, null

So out of the six attempts run here with two attempts per document, only 3 of the attempts actually succeeded and returned the result pulled off of the stack.

That's the principle to ensuring the result you want.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
1

Looks like you're looking for

db.requests.findAndModify()

According to the documentation, if you use this with a unique index on the main field you should end up in a good place.

http://docs.mongodb.org/manual/reference/method/db.collection.findAndModify/

Malcolm Murdoch
  • 1,075
  • 6
  • 9