2

I have a static database (that will never even receive a write) of around 5 GB, while my server RAM is 30 GB. I'm focusing on returning complicated aggregations to the user as fast as possible, so I don't see a reason why I shouldn't have (a) the indexes and (b) the entire dataset stored entirely in RAM, and (c) automatically stored there whenever the Mongo server boots up. Currently my main bottleneck is running group commands to find unique elements out of millions of rows.

My question is, how can I do either (a), (b), or (c) while running on the new Mongo/WiredTiger? I know the "touch" command doesn't work with WiredTiger, so most information on the Internet seems out of date. Are (a), (b), or (c) already done automatically? Should I not be doing each of these steps with this use case?

Community
  • 1
  • 1
collisionTwo
  • 899
  • 2
  • 12
  • 18
  • Could you give an example of your model and query? – gtsouk Jul 22 '15 at 21:04
  • Well, I have a couple million documents that look like `{field1: "a", field2: "b"} with indexes on both field1 and field2, and a compound index on field1 and field2; I run an aggregate pipeline where I match field1 and field2, and then do a $group by field1 to get all unique values of field1, and then $group against to count them, and return the count. It can take seconds to return a count, when I'd like to get it to be <1 s. – collisionTwo Jul 22 '15 at 21:13
  • Use db.serverStatus() to figure out aggregate totals of what mongo is doing with your memory. Use mongostat to check out the "dirty %" and "used %" to figure out how much of your cache that WiredTiger is using. Also I'm guessing that you're not somehow setting the --wiredTigerCacheSizeGB to some small value when you start Mongo? – womp Jul 22 '15 at 21:56
  • Here's my block-manager and cache results from db.serverStatus(): http://pastebin.com/6MgqQyBS, and my wiredTigerCacheSizeGB is set to 25. – collisionTwo Jul 22 '15 at 22:07
  • Try running aggregate().explain() too see if your indices are being used – gtsouk Jul 23 '15 at 06:29
  • With the above data structure abd since you mentioned that the data never change, you could have everything precomputed. Calculate the distinct values with their counts once, store those on a new collection abd do queries on that. – gtsouk Jul 23 '15 at 06:34
  • The indexes are being used (lol, querying this collection on unindexed fields takes an untold amount of time), and unfortunately, the complexity of the sorts of requests we get make precomputation infeasible (e.g., if a user queries on field2, we don't necessarily know the # of unique values of field1 that match that field2 query, and can't store that for all field2 values, etc.). However, I seem to have fixed the issue by using a RAM disk. – collisionTwo Jul 23 '15 at 18:42

1 Answers1

0

Normaly you shouldn't have to do anything. The disk pages are loaded in RAM upon request and stay there. If there is no more free memory the older (unused) pages get unloaded to be used by other programs that need them.

If you must have your whole db in ram you could use a ramdisk and tell mongo to use it as a storage device.

I would recommend that you revise your indices and/or data structures. Having the correct ones can make a huge difference in performance. We are talking about seconds vs hours.

gtsouk
  • 5,208
  • 1
  • 28
  • 35
  • How can I tell what Mongo is loading into RAM? My RAM usage is static around ~1 GB, even after restarting mongo, and doesn't seem to change depending on what queries I run. The database is 5 GB and the total index size is around 3 GB. – collisionTwo Jul 22 '15 at 21:17