Mongodb update guarantee using w=0

Question

I have a large collection with more that half a million of docs, which I need to updated continuously. To achieve this, my first approach was to use w=1 to ensure write result, which causes a lot of delay.

collection.update(
    {'_id': _id},
    {'$set': data},
    w=1
)

So I decided to use w=0 in my update method, now the performance got significantly faster.

Since my past bitter experience with mongodb, I'm not sure if all the update are guaranteed when w=0. My question is, is it guaranteed to update using w=0?

Edit: Also, I would like to know how does it work? Does it create an internal queue and perform update asynchronously one by one? I saw using mongostat, that some update is being processed even after the python script quits. Or the update is instant?

Edit 2: According to the answer of Sammaye, link, any error can cause silent failure. But what happens if a heavy load of updates are given? Does some updates fail then?

You shouldn't see a bit difference if this is multithreaded ingest of data. Note that inserts are done when you see no more inserts happening in mongostat, NOT when your client returns. — Asya Kamsky, Nov 16 '14 at 23:21

Sammaye · Answer 1 · 2014-11-14T16:22:19.960

2

No, w=0 can fail, it is only:

http://docs.mongodb.org/manual/core/write-concern/#unacknowledged

Unacknowledged is similar to errors ignored; however, drivers will attempt to receive and handle network errors when possible.

Which means that the write can fail silently within MongoDB itself.

It is not reliable if you wish to specifically guarantee. At the end of the day if you wish to touch the database and get an acknowledgment from it then you must wait, laws of physics.

edited Nov 14 '14 at 16:22

answered Nov 14 '14 at 16:09

Sammaye

43,242
7
104
146

@Sammae Without error, can it fail if heavy load is given? – Dewsworld Nov 14 '14 at 16:12
@Dewsworld Since it *should* be able to catch and deal with network errors, if you were to have NO errors in MongoDB, including index errors etc then there is a small chance, very small, possibly too small – Sammaye Nov 14 '14 at 16:20
@Dewsworld as for internal workings: it will work off a job queue until it is done – Sammaye Nov 14 '14 at 16:23

Markus W Mahlberg · Answer 2 · 2014-11-17T22:44:29.703

Does `w:0` guarantee an update?

As Sammaye has written: No, since there might be a time where the data is only applied to the in memory data and is not written to the journal yet. So if there is an outage during this time, which, depending on the configuration, is somewhere between 10 (with j:1 and the journal and the datafiles living on separate block devices) and 100ms by default, your update may be lost.

Please keep in mind that illegal updates (such as changing the _id of a document) will silently fail.

How does the update work with `w:0`?

Assuming there are no network errors, the driver will return as soon it has send the operation to the mongod/mongos instance with w:0. But let's look a bit further to give you an idea on what happens under the hood.

Next, the update will be processed by the query optimizer and applied to the in memory data set. After sucessful application of the operation a write with write concern w:1 would return now. The operations applied will be synced to the journal every commitIntervalMs, which is divided by 3 with write concern j:1. If you have a write concern of {j:1}, the driver will return after the operations are stored in the journal successfully. Note that there are still edge cases in which data which made it to the journal won't be applied to replica set members in case a very "well" timed outage occurs now.

By default, every syncPeriodSecs, the data from the journal is applied to the actual data files.

Regarding what you saw in mongostat: It's granularity isn't very high, you might well we operations which took place in the past. As discussed, the update to the in memory data isn't instant, as the update first has to pass the query optimizer.

Will heavy load make updates silently fail with `w:0`?

In general, it is safe to say "No." And here is why:

For each connection, there is a certain amount of RAM allocated. If the load is so high that mongo can't allocate any further RAM, there would be a connection error – which is dealt with, regardless of the write concern, except for unacknowledged writes.

Furthermore, the application of updates to the in memory data is extremely fast - most likely still faster than they come in in case we are talking of load peaks. If mongod is totally overloaded (e.g. 150k updates a second on a standalone mongod with spinning disks), problems might occur, of course, though even that usually is leveraged from a durability point of view by the underlying OS.

However, updates still may silently disappear in case of an outage when the write concern is w:0,j:0 and the outage happens in the time the update is not synced to the journal.

Notes:

The optimal balance between maximum performance and minimal guaranteed durability is a write concern of j:1. With a proper setup, you can reduce the latency to slightly over 10ms.
To further reduce the latency/update, it might be worth having a look at bulk write operations, if those apply to your use case. In my experience, they do more often than not. Please read and try before dismissing the idea.
Doing write operations with w:0,j:0 is highly discouraged in case you expect any guarantee on data durability. Use a t your own risk. This write concern is only meant for "cheap" data, which is easy to reobtain or where speed concern exceeds the need for durability. Collecting real time weather data in a large scale would be an example – the system still works, even if one or two data points are missing here and there. For most applications, durability is a concern. Conclusion: use w:1,j:1 at least for durable writes.

"With w:1, the sync interval is drastically reduced. " Do you have a reference for that? — Sammaye, Nov 15 '14 at 19:18
@Sammaye: I think it was mentioned in M102. And a simple test proves it: with the default settings and w:1 every write should take around the median of 30 seconds otherwise. — Markus W Mahlberg, Nov 15 '14 at 20:28
It does seem odd since it would make the fsync queue useless, I have done that course but I do not remember that mentioned — Sammaye, Nov 15 '14 at 20:37
On the other hand, t might have been a misunderstanding by me - sometimes it bites not to be a native speaker. The term "applied" used in the docs may well refer to the application of the data to the in memory views, too. Which would make more sense as of the behavior mongod shows. — Markus W Mahlberg, Nov 15 '14 at 20:37
My own understanding is that w:1 says it has been done in memory, and then the sync setting remains the same and just synchs, the only thing that changes to disk time is j:1 — Sammaye, Nov 15 '14 at 20:39
w:1 is applied in memory, j:1 is applied in memory AND flushed to disk so that will be significantly higher latency. — Asya Kamsky, Nov 16 '14 at 23:19
Thanks @AsyaKamsky, for clarifying that. So basically the "only" difference between those two is that data is guaranteed to be successfully applied to the in memory view with `w:1` (provided there is no error produced by the query itself)? — Markus W Mahlberg, Nov 17 '14 at 04:55
the guarantee is that you won't get back a successful acknowledgement *until* the write happens in memory. With w:0 you will get successful acknowledgement when you hand-off the write (basically it won't be acknowledging anything but the fact that your request made it over the network to the server). — Asya Kamsky, Nov 17 '14 at 22:36
your answer describes w:1 incorrectly - you're describing it as if it's fsync:1 - w:1 just means apply in memory (write to one member). It has nothing to do with datafiles syncing. — Asya Kamsky, Nov 17 '14 at 22:38
@AsyaKamsky: Thanks for clarifying. Somehow it sticked in my mind that the w-parts are referring to the data files. Will fix that in my answer. — Markus W Mahlberg, Nov 17 '14 at 22:38
btw, w:0, anything:1 is meaningless - the first place that is written to is RAM so w:1 *must* be by definition satisfied before any other write concern is satisfied. — Asya Kamsky, Nov 17 '14 at 22:39
@AsyaKamsky: Makes sense - without application to RAM, no application to the journal, since the application might already return the applied operations. Corrected that. — Markus W Mahlberg, Nov 17 '14 at 22:47
@MarkusWMahlberg I don't think the last part (recommending w:1,j:1 for most durability) is quite correct - w:2 or w:majority is more durable than j:1 since j:1 does not ensure replication which means the data won't be available in case of failure of this machine. — Asya Kamsky, Nov 24 '14 at 09:06
@AsyaKamsky: I suggested to use `w:1,j:1` "at least". Of course, this won't prevent rollbacks, so `w:majority` is the preferable setting. — Markus W Mahlberg, Nov 24 '14 at 10:11

Mongodb update guarantee using w=0

2 Answers2

Does w:0 guarantee an update?

How does the update work with w:0?

Will heavy load make updates silently fail with w:0?

Notes:

Does `w:0` guarantee an update?

How does the update work with `w:0`?

Will heavy load make updates silently fail with `w:0`?