MongoDb - Utilizing multi CPU server for a write heavy application

Question

I am currently evaluating MongoDb for our write heavy application...

Currently MongoDb uses single thread for write operation and also uses global lock whenever it is doing the write... Is it possible to exploit multiple CPU on a multi-CPU server to get better write performance? What are your workaround for global write lock?

check http://stackoverflow.com/questions/2954957/mongodb-vs-couchdb-speed-optimization. — Andrew Orsich, Dec 10 '10 at 09:38

score 19 · Answer 1 · edited Aug 11 '23 at 16:52

No, it is still recommended to use sharding to utilize multiple CPU cores. As stated in the FAQ

Sharding improves concurrency by distributing collections over multiple mongod instances, allowing shard servers (i.e. mongos processes) to perform any number of operations concurrently to the various downstream mongod instances.

Each mongod instance is independent of the others in the shard cluster and uses the MongoDB readers-writer lock). The operations on one mongod instance do not block the operations on any others.

Sharding on a single box has its issues, as one user stated in the mongodb-user mailing list

After some significant experimentation, I've found a single MongoDB shard daemon CANNOT use more than one CPU. On a 24 CPU box, performance scales up until we hit about 8 shards, then another limit kicks in.

Thank you very much for this answer. So our database servers will get two quad-core CPUs. — Philipp, Mar 14 '13 at 12:52

score 18 · Accepted Answer · answered Dec 10 '10 at 16:14

18

So right now, the easy solution is to shard.

Yes, normally sharding is done across servers. However, it is completely possible to shard on a single box. You simply fire up the shards on different ports and provide them with different folders. Here's a sample configuration of 2 shards on one box.

The MongoDB team recognizes that this is kind of sub-par, and I know from talking to them that they're looking at better ways to do this.

Obviously once you get multiple shards on one box and increase your write threads, you will have to be wary of disk IO. In my experience, I've been able to saturate disks with a single write thread. If your inserts/updates are relatively simple, you may find that extra write threads don't do anything. (Map-Reduces are the exception here, sharding definitely helps there)

answered Dec 10 '10 at 16:14

Gates VP

44,957
11
105
108

Thanks... I will give it a try... We do have to do some large scale analysis and this could be really helpful on a powerful server.. Do you know of any negative effect of such configuration? – StackUnderflow Dec 10 '10 at 17:11
Negative effects: the extra complexity of setting up a cluster on a single node, and of course the single point of failure. Obviously, I would test such a solution first to see if it meets your needs, and I think it's fair to start by running a single node to see what the throughput is like :) – Gates VP Dec 13 '10 at 03:16
How many inserts/sec were you doing to saturate your disk I/O? I can't get anywhere near sequential write saturation with a single thread. – EhevuTov Oct 20 '11 at 05:21
That will depend heavily on the size of the documents. If you insert 100 docs @ 1MB, then you need 100MB/s of throughput (typical server drive). If you're inserting lots of 1k docs, then you need 100k docs to saturate that same drive and you're probably not getting 100k inserts/sec single-threaded. – Gates VP Oct 20 '11 at 18:52
2

Has this answer changed in the past two years? – Philipp Mar 10 '13 at 23:27
@Philipp the question has changed in two years. This question states as assumptions things that are no longer true. Therefore it's a bad idea to add bounty to a misleading (as of today) question. Please ask a new question if you want to know how to maximize write-throughput. There are very few applications which are actually limited by the write lock - most encounter I/O bandwidth limitations far before then. – Asya Kamsky Mar 11 '13 at 23:34
@Philipp under the most recent version, the write lock is now implemented at the DB level. Of course, _in theory_ this would mean that all non-trivial collections (*tables*) should now be implemented as independent databases to maximize throughput. In practice, I do not know how well this will work and strongly recommend testing your use case. I have regularly seen maxed out Disk IO without any "tricks". – Gates VP Mar 12 '13 at 00:25

MongoDb - Utilizing multi CPU server for a write heavy application

2 Answers2

Linked