2

I have created 4 indexes to test query performance in my collection when quering for two fields of the same document, one of which is an array (needs a multi-key index). Two of the indexes are single and two compound.

I am surpised because of getting better performance with one of the single indexes than with the compound ones. I was expecting to obtain the best performace with a compound index, because I understand that it indexes the two fields allowing for faster querying.

These are my indexes:

{    "v" : 1, 
     "key" : { "_id" : 1 }, 
     "ns" : "bt_twitter.mallorca.mallorca", 
     "name" : "_id_"  
}, 
{    "v" : 1, 
     "key" : { "epoch_creation_date" :1 }, 
     "ns" : "bt_twitter.mallorca.mallorca", 
     "name" : "epoch_creation_date_1"  
}, 
{     "v" : 1, 
      "key" : { "related_hashtags" : 1 }, 
      "ns" : "bt_twitter.mallorca.mallorca", 
      "name" : "related_hashtags_1"  
},  
{     "v" : 1, 
      "key" : { "epoch_creation_date" : 1, "related_hashtags" : 1 }, 
      "ns" : "bt_twitter.mallorca.mallorca", 
      "name" : "epoch_creation_date_1_related_hashtags_1"  
}

My queries and performance indicators are (hint parameter shows the index used at each query):

QUERY 1:

active_collection.find(
    {'epoch_creation_date': {'$exists': True}}, 
    {"_id": 0, "related_hashtags":1}
).hint([("epoch_creation_date", ASCENDING)]).explain()

millis: 237

nscanned: 101226

QUERY 2:

active_collection.find(
    {'epoch_creation_date': {'$exists': True}}, 
    {"_id": 0, "related_hashtags": 1}
).hint([("related_hashtags", ASCENDING)]).explain()

millis: 1131

nscanned: 306715

QUERY 3:

active_collection.find(
     {'epoch_creation_date': {'$exists': True}},
     {"_id": 0, "related_hashtags": 1}
).hint([("epoch_creation_date", ASCENDING), ("related_hashtags", ASCENDING)]).explain()

millis: 935

nscanned: 306715

QUERY 4:

active_collection.find(
     {'epoch_creation_date': {'$exists': True}}, 
     {"_id": 0, "related_hashtags": 1}
).hint([("related_hashtags", ASCENDING),("epoch_creation_date", ASCENDING)]).explain()

millis: 1165

nscanned: 306715

QUERY 1 scans less documents, what is probably the reason to be faster. Can somebody help me to understand why is it performing better than queries with compound indexes? Therefore, when is better to use a compound index than a single one?

I am reading mongo documentation but these concepts are resulting hard for me to digest.

Thanks in advance.

UPDATED question (in response to Sammaye and Philipp)

This is the result of a full explain()

"cursor" : "BtreeCursor epoch_creation_date_1",
"isMultiKey" : false,
"n" : 101226,
"nscannedObjects" : 101226,
"nscanned" : 101226,
"nscannedObjectsAllPlans" : 101226,
"nscannedAllPlans" : 101226,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 242,
"indexBounds" : {u'epoch_creation_date': [[{u'$minElement': 1}, {u'$maxElement': 1}]]

},
"server" : "vmmongodb:27017"

for the following query:

active_collection.find(
{'epoch_creation_date': {'$exists': True}}, 
{"_id": 0, "related_hashtags":1})
.hint([("epoch_creation_date", ASCENDING)]).explain()
Oscar Moya
  • 47
  • 1
  • 11
  • You would need to tell us how these indexes are defined. – Philipp Nov 29 '13 at 12:39
  • Hi Phillipp, do you mean how did I create the index? I used active_collection.create_index([("epoch_creation_date", ASCENDING),("related_hashtags", ASCENDING)]) for example – Oscar Moya Nov 29 '13 at 12:42
  • I mean the ensureIndex calls you used to create the indexes. – Philipp Nov 29 '13 at 12:44
  • Hmm looking again query 2 makes no sense as to why it is the same nscanned as query 3 and query 4, in fact there is no reason why query 4 should have the same nscanned as query 3 – Sammaye Nov 29 '13 at 13:00
  • As @Philipp says, what indexes do you ACTUALLY have? – Sammaye Nov 29 '13 at 13:00
  • @Philipp in PyMongo indexes can be created with create_index and ensure_index, with slight differences in their behaviour: http://stackoverflow.com/questions/5912661/pymongo-mongodb-create-index-or-ensure-index . I used create_index coding instructions as: `active_collection.create_index([("epoch_creation_date", ASCENDING),("related_hashtags", ASCENDING)])` – Oscar Moya Nov 29 '13 at 13:01
  • Also can you give an `explain()` for all of these queries? – Sammaye Nov 29 '13 at 13:03
  • @Philipp and @Sammaye. Firts many thanks to both for answering so fast!!! I have four indexes active, 1st `[("epoch_creation_date", ASCENDING)]` 2nd `[("related_hashtags", ASCENDING)]` 3rd `[("epoch_creation_date", ASCENDING),("related_hashtags", ASCENDING)]` and 4th `[("related_hashtags", ASCENDING),("epoch_creation_date", ASCENDING)]` I included 3rd and 4th because understood that in compund indexes the order was relevant – Oscar Moya Nov 29 '13 at 13:07
  • @Sammaye, both millis and nscanned have been retrieved using explain() at the end of the 4 queries – Oscar Moya Nov 29 '13 at 13:10
  • Can you actually do a db.colleciton.getIndexes() in the mongo console and can you give the full and formatted explain and not just the nscanned. We need to see the full index usage. – Sammaye Nov 29 '13 at 13:10
  • @OscarMoya These look like two indexes you each created three times, none of them a composite index. – Philipp Nov 29 '13 at 13:12
  • `{ "v" : 1, "key" : { "related_hashtags" : 1, "epoch_creation_date" : 1 }, "ns" : "bt_twitter.mallorca.mallorca", "name" : "related_hashtags_1_epoch_creation_date_1" } ]` – Oscar Moya Nov 29 '13 at 13:59
  • Can you edit the question with this information? It is all split up and badly formatted atm – Sammaye Nov 29 '13 at 14:03
  • @OscarMoya Are you aware that you can edit your question? Never mind, I edited it for you. – Philipp Nov 29 '13 at 14:08
  • Sorry :( Is the first time I do a question here and am not used. I add to the question the full explain obtained... – Oscar Moya Nov 29 '13 at 14:10
  • @Sammaye I deleted the comments with the badly formatted text/code. Thanks Philipp for editing!!!... slowly learning :) – Oscar Moya Nov 29 '13 at 14:33
  • Is that explain of the first query? If so it isn't using any index at all... – Sammaye Nov 29 '13 at 14:33
  • I dont @Philipp s answer is right since by your own english the first query is not using an index even when one exists which means that epoch_creation_date is not an effective index at all. – Sammaye Nov 29 '13 at 14:56
  • @Sammaye there was an error in the full explain I copied in the question, probably I did something wrong querying in the shell. Repeated the query and full explain via code (with Pymongo) and got the results that you see now in the edited question. – Oscar Moya Nov 29 '13 at 15:12

2 Answers2

2

You created a compound index (named epoch_creation_date_1_related_hashtags_1), but you aren't using it in those hints. Instead of that you are using the two single-field indexes you also created (related_hashtags_1 and epoch_creation_date_1) in different order.

Of those two indexes, only epoch_creation_date_1 is effective, because you aren't querying for both fields. You are only querying for one, and this is 'epoch_creation_date': {'$exists': True}. The field-filtering which you perform with {"_id": 0, "related_hashtags":1} is done on the documents which were found by that query. At that point, indexes are of no use anymore. That means any index on related_hashtags won't be able to increase performance on this query. The compound index (when you would actually use it) might be better than no index at all, but not as good as the index on epoch_creation_date only.

Philipp
  • 67,764
  • 9
  • 118
  • 153
  • That makes sense, but keep being confused. Reading Mongo docs at: http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/ They say that `codedb.users.ensureIndex( { status: 1, user: 1 } )` would be a good index for `db.users.find( { status: "A" }, { user: 1, _id: 0 } )` – Oscar Moya Nov 29 '13 at 12:51
  • @OscarMoya But it seems like this wasn't what you did. That's why I asked you to provide the ensureIndex commands you used to create the indexes. – Philipp Nov 29 '13 at 12:55
  • I followed the instructions at http://api.mongodb.org/python/current/tutorial.html#indexing, to create indexes. I start to think that PyMongo could not be supporting the creation of such compound indexes for multikey fields (what sound in any case unlikely to me). The docs of PyMongo are not very explicit on this. – Oscar Moya Nov 29 '13 at 13:14
  • @OscarMoya Updated answer based on the index declarations you gave us. – Philipp Nov 29 '13 at 14:12
  • Yes, this makes real sense to me now. Thanks a lot to you both! I will tick this answer as the one solving the question. A last question, if you don't mind... is there a way I can give also credit in stack overflow to Sammaye for his help?? – Oscar Moya Nov 29 '13 at 14:45
  • @OscarMoya When you collect 15 reputation you will be able to [upvote answers](http://stackoverflow.com/help/privileges/vote-up). – Philipp Nov 29 '13 at 14:49
1

Ok after reading the question more I understand the problem. The multikey index will write an index entry PER multivalue. This means if you have 3 values per related_hashtags per document your index is actually 3x the size and has 3x the number of values to scan (if my math adds up there...).

nscanned is a counter for how times a document had to be looked at (note counter, not a specific number of unique documents looked at), this means that due to the multikey index you had to scan roughly 3x the amount of (same) documents you normally would for the first query.

This is a known caveat with multikey indexes and why you should be careful about just throwing them around like this.

I believe the reason why the third query is so slow is because multikey indexes cannot support indexOnly cursors so MongoDB could not use covered queries there.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • You are right. Other readers can see this at: http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/ The docs say: "An index cannot cover a query if: any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index index and cannot support a covered query" – Oscar Moya Nov 29 '13 at 14:55