0

I would like to know how long a mongo internal cache would sustain. I have a scenario in which i have some one million records and i have to perform a search on them using the mongo-java driver.

The initial search takes a lot of time (nearly one minute) where as the consecutive searches of same query reduces the computation time (to few seconds) due to mongo's internal caching mechanism.

But I do not know how long this cache would sustain, like is it until the system reboots or until the collection undergoes any write operation or things like that.

Any help in understanding this is appreciated!

PS:

  • Regarding the fields with which search is performed, some are indexed and some are not.
  • Mongo version used 2.6.1
Sree Karthik S R
  • 335
  • 2
  • 3
  • 10

3 Answers3

1

It will depend on a lot of factors, but the most prominent are the amount of memory in the server and how active the server is as MongoDB leaves much of the caching to the OS (by MMAP'ing files).

You need to take a long hard look at your log files for the initial query and try to figure out why it takes nearly a minute.

Martin
  • 5,119
  • 3
  • 18
  • 16
  • I have figured that the distinct query on an array field (It is array because there are cases of being one element in it and cases with multiple) in one million documents takes nearly a minute. But I do not have any other alternatives. Java duplication removal from List to Set is equivalent on the first hit and is costlier for its successions. – Sree Karthik S R Oct 15 '15 at 10:58
  • It would be helpful for my analysis if you could share what are the factors Mongo cache depends on or any references about that. – Sree Karthik S R Oct 15 '15 at 12:10
0

In most cases there is some internal cache invalidation mechanism that will drop your cached query internal record when write operation occurs. It is the simplest describing of process. Just from my own expirience. But, as mentioned earlier, there are many factors besides simple invalidation that can have place.

lazycommit
  • 414
  • 5
  • 18
0

MongoDB automatically uses all free memory on the machine as its cache.It would be better to use MongoDB 3.0+ versions because it comes with two Storage Engines MMAP and WiredTiger.

The major difference between these two is that whenever you perform a write operation in MMAP then the whole database is going to lock and whereas the locking mechanism is upto document level in WiredTiger.

If you are using MongoDB 2.6 version then you can also check the query performance and execution time taking to execute the query by explain() method and in version 3.0+ executionStats() in DB Shell Commands.

You need to index on a particular field which you will query to get results faster. A single collection cannot have more than 64 indexes. The more index you use in a collection there is performance impact in write/update operations.

Mukul Mantosh
  • 358
  • 4
  • 15
  • I do not understand how upgrading to MongoDB 3.0 would help in my scenario. I know about the **MMAP** and **WiredTiger** but it wont help in my case I guess. Regarding my query, I have to scan and traverse the entire set of documents at-least once since the read operation I do is a consolidated search. The field is already indexed, and the index takes nearly 100MB. – Sree Karthik S R Oct 15 '15 at 12:05
  • I have inserted 10 million records in MongoDB. I was using PHP Driver for making all requests(CRUD). When i was requesting something from 10 million records without taking the help of indexing it was taking some time but not one minute.....it was something around 7-10 seconds.......but after i indexed the particular field....the execution time was reduced 1-3 milliseconds. Without indexing the entire collection is being scanned so definitely it is going to take some time to get results because mongodb is going to scan all the records from beginning to end. – Mukul Mantosh Oct 15 '15 at 14:33
  • db.ro.distinct("requiredField") This is the query I do.. The field "requiredField" is an array and I need a distinct copy of it since there are duplicates in this entry for various documents. Even for this, it takes more than a minute for the first time. – Sree Karthik S R Oct 16 '15 at 10:37
  • Stats : { "ns" : "myDB.myCollection", "count" : 833798, "size" : 2332262304, "avgObjSize" : 2797, "storageSize" : 2897301504, "numExtents" : 21, "nindexes" : 7, "lastExtentSize" : 756662272, "paddingFactor" : 1, "systemFlags" : 0, "userFlags" : 1, "totalIndexSize" : 603936592, "indexSizes" : { "_id_" : 93778720 }, "ok" : 1 } – Sree Karthik S R Oct 16 '15 at 10:57
  • If you are using distinct query then there is no point of indexing and index works on unique fields to make search fast.distinct query always do a table scan from beginning to end so definitely its going to take longertime....well i performed a distinct query over 1million duplicate records and got results in 3seconds and definitely the time is going to increase when size exceeds more. Please have a look what distinct query i performed [link](http://pastebin.com/Lv27CbG0). Please visit this link [link](https://groups.google.com/forum/#!topic/mongodb-user/mTOmJjIGN_4) you will get your answer. – Mukul Mantosh Oct 16 '15 at 13:28