mongo shard has high number of faults, but top shows only 8% of memory used by mongod

Question

I am investigating some performance issues in my mongo setup. I have 3 query controllers and 6 shards. I'm doing a large batch import/update, and on the 3 query controllers, I'm getting around 200 queries per second. However, when investigating the shards, I'm seeing tons of faults, like 50 per second. When I run top on the shard servers, I see they're only using around 8% of memory. That sort of implies to me that mongo is not configured on the shard servers to use all memory available. Am I wrong? Thanks for any advice.

Wesley · Answer 1 · 2015-01-02T05:02:36.250

I have 3 query controllers and 6 shards.

I assume you mean three mongos machines, AKA "routers"? I've never heard the term "query controller" in reference to MongoDB. It's non-standard enough to make the question confusing.

I'm doing a large batch import/update, and on the 3 query controllers, I'm getting around 200 queries per second.

Okay so roughly 600 queries per second are hitting your sharded environment.

However, when investigating the shards, I'm seeing tons of faults, like 50 per second. When I run top on the shard servers, I see they're only using around 8% of memory. That sort of implies to me that mongo is not configured on the shard servers to use all memory available. Am I wrong? Thanks for any advice.

I'm assuming you're using top on the host OS and not mongotop against the running shard member. What I'm perceiving is that you'll want to investigate MongoDB's memory mapped nature. That's a huge topic, and a good weekend's worth of reading, but here's the Cliff's Notes:

MongoDB uses the host OS memory management to cache database files in RAM.
Just because the MongoDB process is taking up X amount of RAM doesn't mean that it's not also effectively using Y amount.
In Linux, look at the cached memory and much of that will most likely be what MongoDB is "faulting" to in mongostat. Faulting in mongostat does not mean hard faults to disk, it means faults outside of the working set that the mongod process is using. It could still be hitting the files in RAM though, and be virtually as fast as if the memory was actually posessed by the mongod.
If you want to determine if MongoDB is out of RAM you need to look at the host's hard fault numbers. If you're hard faulting and hitting disk for data, then and only then is MongoDB out of memory. To me it wasn't clear if the faults you were talking about were mongostat faults, or OS soft faults, or OS hard faults.
MongoDB uses the memory that MongoDB wants to use. There's not much in the way of optimization that you can do other that using touch in a warm-up script that pulls data into memory by brute force rather than letting the mongod organically load the working set over the lifespan of the application's regular query load. There's very little reason to do that though, except for quickly getting your memory heated up on a new primary after a stepdown, or heating memory up rapidly for a new mongod PID (maybe as a result of a reboot, for example).
Hey kids! MongoDB memory troubleshooting is hard! Let's go read up on it! ᕕ( ᐛ )ᕗ

To me you haven't described anything that's unusual for MongoDB if it was anything except hard faults. The mongod itself can be holding very little RAM realtive to what's available on the host, but the OS's cache will be full of memory mapped files that MongoDB refers to. One recent MongoDB instance I was troubleshooting had 144GB of RAM and the mongod was using less than 10%, and yet the host Linux OS had over a hundred GB of RAM listed as cached mem and there were no disk faults. I.e. MongoDB was happy.

If it was hard faults in the OS, and the OS's swap is being used... that would be odd, and an OS level tuning problem.

On Linux, the faults reported in mongostat are in fact hard faults to disk. Soft faults are not reported (except on Windows when both are reported), so most of the rest of the argument that follows in your answer doesn't hold. — Adam C, Jan 03 '15 at 01:15
@AdamC Okay, I've been in a similar situation with a MongoDB instance and it was the Linux host not freeing up cached memory. Something like `free && sync && echo 3 > /proc/sys/vm/drop_caches && free` fixed it up. Here's more about what that does: http://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system — Wesley, Jan 03 '15 at 01:23
sync just makes sure that data is flushed to disk, and free is just reporting, so really the drop_caches piece is the only relevant piece and that effectively just drops all caches (for non-running processes only), it doesn't really have anything to do with hard/soft faulting besides the fact that if you do it while mongod is not running it will mean all data will need to be paged from disk when you restart it. I suppose with a kernel in a bad state it *might* help, but I think you would have other issues at that point too — Adam C, Jan 03 '15 at 01:48

mongo shard has high number of faults, but top shows only 8% of memory used by mongod

1 Answers1