1

On mongoDB 3.6.3 I create this collection with two million records:

function randInt(n) { return parseInt(Math.random()*n); }

for(var j=0; j<20; j++) {  
  print("Building op "+j);
  var bulkop=db.media.initializeOrderedBulkOp() ;
  for (var i = 0; i < 100000; ++i) {
    bulkop.insert(    
      {
        id_profile: NumberLong("222"),
        needle_id: NumberInt(randInt(2000000000)),
        visibility: NumberInt(randInt(5)),
      }
    )
  };
  print("Executing op "+j);
  bulkop.execute();
}

then I create this partial index :

db.media.createIndex( 
  {"id_profile": 1, "visibility": 1}, 
  {unique: false, partialFilterExpression: { "needle_id": { $exists: true } }} 
); 

then I run this query that exactly matches the partial index :

db.media.count({$and:[
  {id_profile:NumberInt(222)},
  {visibility:NumberInt(0)},
  {needle_id:{$exists:true}}]}) 

but it's slow :( In fact it's the same speed as if I was not using a partial index and where I need to filter all docs who don't have needle_id:

db.media.createIndex( 
  {"id_profile": 1, "visibility": 1}, 
  {unique: false} 
); 

db.media.count({$and:[
  {id_profile:NumberInt(222)},
  {visibility:NumberInt(0)},
  {needle_id:{$exists:true}}]}) 

So is it a bug with partial index ? What I can do to speed my count ?

Vince Bowdren
  • 8,326
  • 3
  • 31
  • 56
zeus
  • 12,173
  • 9
  • 63
  • 184
  • Are you using mongodb version 3.2 or higher? – Tal Z Apr 30 '18 at 10:02
  • 1
    the very last version of mongoDB 3.6.3 – zeus Apr 30 '18 at 10:15
  • could it be it is the same speed as the other index because there are no items without `needle_id` in your collection? – Tal Z Apr 30 '18 at 10:39
  • no its much more slower than db.media.count({$and:[ {id_profile:NumberInt(222)}, {visibility:NumberInt(0)}]}) – zeus Apr 30 '18 at 11:36
  • Hi loki; could you try a few variants on your query, to see if any of them make a difference; for example, instead of using $and you could just add conditions to the basic query object. It would be interesting to see which queries use the partial index and which don't. – Vince Bowdren May 03 '18 at 13:36

1 Answers1

1

I investigated this issue for quite some time now, it seems that running a count query on a partial index does not avoid iterating over all the documents of the index.

When creating an index on a large collection, running a count query which is based on the index, finishes within few milliseconds. it is actually counting the index's size for the values that was provided.

When looking at the execution plan of such a query, we can notice the 'IDX_SCAN' phase solely, which stands for index scan. (scanning the index on the requested bounds and counting number of documents it contains)

But when using a partial index, it seems the count functionality as implemented by mongo is NOT doing an index scan, but instead its running over the documents as if we ran a modify query.. then eventually returns the total amount of docs.

this can be proved by looking at the execution plan for such a query, and see that it shows the phase is 'FETCH', and not 'IDX_SCAN', but fetch using the index, so its quicker then if an index was not created at all.

More complete information Can be found here: Why does the "distinct" and "count" commands happen so slowly on indexed items in MongoDB?