Mongo bulk replace with 200k+ operations

Question

E.g. I have such documents in the collection:

{
    "key": "key1",
    "time": 1000,
    "values": [] // this one is optional
}

I need to update the collection from, let's say, CSV file by modifying or removing values column and where key & time are filters .

What I've tried so far:

DeleteMany(with or(and(key: key1), and(time: time2)), ... 276k more or arguments) + InsertMany with 276k documents => ~ 90 seconds
Bulk ReplaceOne with (filter: and(key: key1, time: time2)) => ~ 40 seconds
Split huge bulk into several smaller batches (7500 seems to be the most performant), but this one is not atomic in terms of db operation => ~ 35 seconds

Notes:

All tests were with bulk.ordered = false to improve performance.
There is unique index key: 1, time: -1

Is there a possibility to optimize such kind of request? I know Mongo can burst to ~80k inserts/s, but what about replacements?

@D.SM sorry, was sleepy) Updated question – Andrii Abramov Feb 04 '21 at 11:34 — Andrii Abramov, Feb 04 '21 at 11:34

D. SM · Answer 1 · 2021-02-05T13:37:37.300

0

Bulk operations are not atomic as the submitted group. Only individual operations are atomic. Note also that the driver will split bulk operations into smaller batches automatically if you submit more than a certain number (1,000 when encryption is not used) which is why huge batches tend to perform worse than batches of under one thousand.

To answer your question on performance:

Create a test deployment using tmpfs for storage.
Find out how many queries/second this deployment can sustain.
Find out how many updates/second this deployment can sustain.
If the number of updates/second is about half of the number of queries/second, you are probably operating at the maximum efficiency.

You are going to have lower performance using SSD and magnetic disk backing storage, naturally. The idea with the memory test is to ensure you are using the database as efficiently as possible.

Especially with a mixed read and write workload, if you are using a magnetic disk, switching to SSD storage should yield a noticeable performance gain.

edited Feb 05 '21 at 13:37

answered Feb 04 '21 at 23:48

D. SM

13,584
3
12
21

1) Thanks for suggestion. 2) Will try it out and respond. 3) As of MongoDB 3.6, single bulk [limit value is 100,000](https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/) – Andrii Abramov Feb 05 '21 at 10:52
The relevant spec is https://github.com/mongodb/specifications/blob/master/source/driver-bulk-update.rst. – D. SM Feb 05 '21 at 13:36
Why? [The doc clearly says](https://docs.mongodb.com/manual/reference/limits/#Write-Command-Batch-Limit-Size): Changed in version 3.6: The limit raises from 1,000 to 100,000 writes. This limit also applies to legacy OP_INSERT messages. – Andrii Abramov Feb 05 '21 at 16:25
I can see for Java driver it has 1000 as a default in`com.mongodb.internal.connection.MessageSettings.DEFAULT_MAX_BATCH_COUNT` but it could be overridden – Andrii Abramov Feb 05 '21 at 16:32
The driver creates the request to the server. The higher limit that the server has doesn't come into play. – D. SM Feb 05 '21 at 19:00

Mongo bulk replace with 200k+ operations

1 Answers1