0

I have a index on id_profile and i do db.myCollection.count({"id_profile":xxx}). It's quite fast if the count is low, but if the count is large, it starts being slow. For example if there is 1 000 000 records matching {"id_profile":xxx} then it can take up to 500 ms to return the count. I think that internally the engine is simply loading all the documents matching {"id_profile":xxx} to count them.

Is there a way to quickly retrieve a count where the filter match exactly an index? I would like to avoid to use a counter collection :(

NOTE: I m on mongoDB 3.6.3 and this the script i used:

db.createCollection("following");
db.following.createIndex( {"id_profile": 1}, {unique: false} ); 

function randInt(n) { return parseInt(Math.random()*n); }

for(var j=0; j<10; j++) {  
  print("Building op "+j);
  var bulkop=db.following.initializeOrderedBulkOp() ;
  for (var i = 0; i < 1000000; ++i) {
    bulkop.insert(    
      {
        id_profile: NumberLong("-4578128619402503089"),
        id_following: NumberLong(randInt(9223372036854775807))
      }  
    )
  };
  print("Executing op "+j);
  bulkop.execute();
}

db.following.count({"id_profile":NumberLong("-4578128619402503089")});
zeus
  • 12,173
  • 9
  • 63
  • 184
  • Hm, indeed. In which case, that's weird. The query should be backed by the index, and it shouldn't be that slow. How's the load on the database? A lot of queries like this? Or other heavy queries/inserts? – Sergio Tulentsev Apr 23 '18 at 22:33
  • Probably not "weird" at all *" ... if there is 1 000 000 records matching..."* well if there are that many "matches" then it's pretty logical that the collection has a lot of documents. *"...I think that internally the engine is simply loading all the documents ..."*, **no** it actually does not, but even with an index the "query plan" still takes time to execute. Sounds like you simply have "a **lot** of data" and your server deployment simply is not up to the task. If you truly have "millions" of documents, then you probably need to consider sharding to "spread the load". – Neil Lunn Apr 24 '18 at 04:42
  • Beyond *"use an index"* there really is not a "coding" solution to this. As such it's not really on topic for this site. – Neil Lunn Apr 24 '18 at 04:43
  • no, the index and database is in memory, so everything is up. i calculate that it's take around 0.0005 ms per doc in the matching, so 5ms for 10 000 doc matching the query, 50ms for 100 000 doc, and so one ... but it's look like the engine load or scan all id_profile equal to xxx – zeus Apr 24 '18 at 07:31
  • @loki: show that index definition. – Sergio Tulentsev Apr 24 '18 at 09:25
  • @NeilLunn: my only other guess is that the index on `id_profile` actually looks something like this `{ id_foo: 1, id_profile: 1 }` (which can't help the abovementioned query). This is assuming that all data and indexes actually _are_ in ram, as loki claims. – Sergio Tulentsev Apr 24 '18 at 09:30
  • 1
    @SergioTulentsev True. I'm really just trying to avoid protracted discussion of *" ... it's look like the engine load .."* when it's been stated that for a single field index it does not. Unless of course `db.myCollection.find({},{ "id_profile": 1, "_id": 0}).count()` performs any different. But it really should not. The "increasing numbers" seems more of an indication that everything is in fact not in memory at all. And again, once in this territory the remaining discussion becomes "off-topic". – Neil Lunn Apr 24 '18 at 09:37
  • @SergioTulentsev: this is the index i use: `db.myCollection.createIndex( {"id_profile": 1, "_id": -1}, {unique: true} );` – zeus Apr 24 '18 at 11:24
  • @loki: try a single field index. – Sergio Tulentsev Apr 24 '18 at 11:29
  • @SergioTulentsev: it's the same with single index field – zeus Apr 24 '18 at 11:38
  • 1
    Can you edit the question to include the specific version of the MongoDB server used (x.y.z) and the `explain()` output for your `count()`? Are you averaging the execution time of the count over multiple iterations with the same criteria or is the outcome from a single `count()` query? Also, how are you confirming which index is being used and whether it is in memory? – Stennie Apr 24 '18 at 13:11
  • 1
    @Stennie: I just updated the question with with the script – zeus Apr 24 '18 at 15:21
  • @loki: just so you know, your script inserts 10m documents, all with the same id_profile (instead of 10 batches of 1m, which you probably intended). But yes, I reproduced the slowness which I do not yet know how to explain. The index should be faster than this... – Sergio Tulentsev Apr 24 '18 at 15:59
  • @SergioTulentsev: I do fastly the script and i made a little mistake, but doesn't matter it's also show well the problem as you see ... – zeus Apr 24 '18 at 17:38

0 Answers0