7

Suppose I have a collection where I create a unique index on a field:

db.users.createIndex({username: 1}, {unique:true})

What happens if two documents with the same username are SIMULTANEOUSLY being inserted in the collection?
How does the database prevent the collision? I mean which one gets inserted and which results in an error?
Assuming the inserts are really SIMULTANEOUS there is no way for the database to know that two duplicates are being inserted, right?
So, what's really going on?

Core_dumped
  • 1,571
  • 4
  • 16
  • 34
  • 3
    In simple terms, whichever insert gets the write lock first wins, and the other insert results in an error. – JohnnyHK May 13 '15 at 16:20
  • @JohnnyHK, Are you saying that getting the write lock at the same instant in time is impossible? – Core_dumped May 13 '15 at 16:22
  • Right, that's the whole concept of an atomic write lock. – JohnnyHK May 13 '15 at 16:23
  • 1
    This why unique indexes cannot yet exist outside of the shard key in sharded envos atm – Sammaye May 13 '15 at 16:24
  • @JohnnyHK, but this leads to the conclusion that if a have 1000000(yes, 1M)+ writes per second, they would be very slow and create a huge queue which will fill up the RAM till everything crashes? – Core_dumped May 13 '15 at 16:26
  • Ok misread, thought you meant 1000000m not 1m. Well locks in MongoDB are latches, this means a write to an index is less than microseconds in most cases – Sammaye May 13 '15 at 16:28
  • You must also remember that MongoDB will write the fsync queue before disk unless you have set it to not..so it doesn't have the IO overhead – Sammaye May 13 '15 at 16:30
  • @Sammaye, even if it takes 10^-2 milliseconds to do a write, a million writes would take 10^4 ms, which is 10 seconds, and we are assuming 1M writes per second. – Core_dumped May 13 '15 at 16:40
  • 1
    Data is applied in memory first, so obtaining a lock is a matter of nanoseconds. And 1M writes most likely wouldn't be feasible on a single server. In a shared environment the new doc would be sent to the shard with the matching key range. If done right, your writes would be more or less evenly distributed (fuzzy). Let's assume you have 10 shards, this would translate to 100k writes / shard / s. Even when taking and lifting a lock would take 10ns each, that would be 2M nanoseconds or 1/500 of a second. Plenty of time left for the other stuff. – Markus W Mahlberg May 13 '15 at 16:44
  • @SylvainLeroux Thanks for calling us to order. Will do. – Markus W Mahlberg May 13 '15 at 16:47
  • Nah as Markus rightly said, you are looking at waaaaay less than that, microseconds for one are smaller than miliseconds, tbh, as he says, the write is more like nanoseconds – Sammaye May 13 '15 at 16:59
  • 2
    @SylvainLeroux I have added an answer as a community wiki. – Markus W Mahlberg May 13 '15 at 17:35

1 Answers1

7

Writes can not be applied simultaneously to the dataset. When a write is sent to a MongoDB instance, be it a shard or a standalone server, here is what happens

  1. A collection wide write lock (which resides in RAM) is requested
  2. When the lock is granted, the resulting data to be written (be it an update, an upsert or a new document) is checked against the unique indices (which usually reside in RAM)
  3. If there is no collision, the data is applied to the dataset in RAM
  4. The lock is released. Only now other writes can start performing changes to the data in memory.
  5. With the default write concern, the query returns now
  6. After commitIntervalMs the data is written to the journal
  7. Only after syncInterval seconds (60 per default), the journal is applied to the data files

That being said, we can look at the actual values. 1 million writes / second seem a bit much for a single server (simply because the mass storage can't handle it), so we assume a sharded cluster with 10 shards, with a shard key which distributes the writes more or less evenly. As we have seen above, all operations are applied in RAM. With today's hardware, some 3.5 billion instructions/s can be processed, or 3.5 instructions per nanosecond. Let's assume getting and releasing a lock each take 35 instructions or 10 nanoseconds. So locking and unlocking for each of our 100k writes would take 20 nanoseconds, altogether 1/500 of a second.

That would leave 499/500 of a second or 998000000 nanoseconds for the other stuff MongoDB needs to do, which translates to a whopping 3.493 billion instructions.

The locks to prevent concurrent writes are far from being the limiting factor for write operations. Syncing the changes to the journal and the data files is usually the limiting factor, followed by to less RAM to keep the indices and working set in RAM.

Markus W Mahlberg
  • 19,711
  • 6
  • 65
  • 89
  • I agree with the top part of your answer, but I disagree with estimating the locking and unlocking speed based on the CPU's ability to process X billion instructions per second. As you correctly stated, the actual CPU instructions involved take negligible time. But that is neither here nor there. You should be looking at memory read & write speed instead. – Lakey May 13 '15 at 17:39
  • To be honest with you: this was just a would guess. But the relation between acquiring and releasing a lock and the actual manipulation of the data should be about right, which was my point. Since I posted this as a community wiki answer on purpose, feel free to edit it according to the actual values. – Markus W Mahlberg May 13 '15 at 17:43
  • The amount of time required to take the lock is based on CPU speed as well as number of CPUs. More CPUs == Longer time to acquire the lock. Also, collection level locks were introduced in MongoDB 3.0 (MMAPv1), in previous version (2.2/2.4/2.6) it was database level, in 2.0 and earlier it was global. for something like this. There is also some inaccuracies in how data is actually written in memory. There is a concept of a shared view and a private view of the data. Updates all take place in the shared view and the shared view is updated atomically. – Pete Garafano May 13 '15 at 17:46
  • 2
    @PeteGarafano it is a community wiki answer, your insights would be a great addition ( and I don't grind rep with it, since no one gets rep for wiki answers) – Markus W Mahlberg May 13 '15 at 17:51
  • @MarkusWMahlberg, does this also imply that there is NO way that `db.users.update({_id:3110, likes:{$exists:false}}, {$inc:{likes:1}})` could result in 2 likes for user with id 3110? Since writes cannot be simultaneous? – Core_dumped May 13 '15 at 20:15
  • 1
    As reference this is a pretty good compliment here: http://stackoverflow.com/a/17459488/383478 – Sammaye May 13 '15 at 21:13
  • @MarkusWMahlberg, I have read the answer maybe 5 times but still have some doubts: if a write is blocking, as far as I can tell it must take way more than 20 nanoseconds to occur, because the database needs to determine the new position inside all the indexes of a collection (and to do so it should probably read from disk: huge index tables can't be stored in RAM). And then it should also write the new document itself on disk, otherwise reads are performed from data mixed in disk and RAM? How can this possibly be? Also If a write can take 20ns, why a read can't? – Core_dumped May 14 '15 at 04:44
  • Well you obviously didn't read it too well. The 20nd were an example for acquiring and releasing **the lock** and has *nothing* to do with read timed or write times - the latency is much higher. You are right, at a certain point indices can not be kept in RAM *on a single machine*. But usually the sheer data size forces you to shard. Also, smart modelling usually helps a lot. – Markus W Mahlberg May 14 '15 at 05:23