9

I've been looking into array (multi-key) indexing on MongoDB and I have the following questions that I haven't been able to find much documentation on:

Indexes on an array of subdocuments

So if I have an array field that looks something like:

{field : [
  {a : "1"}, 
  {b : "2"}, 
  {c : "3"}
  ]
}

I am querying only on field.a and field.c individually (not both together), I believe I have a choice between the following alternatives:

  1. db.Collection.ensureIndex({field : 1});
  2. db.Collection.ensureIndex({field.a : 1}); db.Collection.ensureIndex({field.c : 1});

That is: an index on the entire array; or two indexes on the embedded fields. Now my questions are:

  • How do you visualize an index on the entire array in option 1 (is it even useful)? What queries is such an index useful for?
  • Given the querying situation I have described, which of the above two options is better, and why?
Vince Bowdren
  • 8,326
  • 3
  • 31
  • 56
Zaid Masud
  • 13,225
  • 9
  • 67
  • 88
  • 2
    +1 for well formed question. – Parvin Gasimzade Nov 27 '12 at 13:34
  • @Aid what did you end up doing? How were results? – Kevin Meredith Oct 01 '13 at 19:14
  • @Kevin for this specific query option 2 worked out better. If you are querying for equivalence on the entire contents of the array, option 1 is better. – Zaid Masud Oct 01 '13 at 19:44
  • 1
    Thanks, @Zaid. Sorry I misspelled your name. As an aside, would you have been able to index on the whole `fields` array if your array had the same field names? Example: `{field : [{a : "1"}, {a : "2"}, {a : "3"}]`? Note that each of the array's JSON object contains an **a** field. – Kevin Meredith Oct 01 '13 at 19:45

1 Answers1

8

You are correct that if you are querying only on the value of a in the field array, both indexes will, in a sense, help you make your query more performant.

However, have a look at the following 3 queries:

> db.zaid.save({field : [{a: 1}, {b: 2}, {c: 3}] });
> db.zaid.ensureIndex({field:1});
> db.zaid.ensureIndex({"field.a":1});

#Query 1
> db.zaid.find({"field.a":1})
{ "_id" : ObjectId("50b4be3403634cff61158dd0"), "field" : [ { "a" : 1 }, { "b" : 2 }, { "c" : 3 } ] }
> db.zaid.find({"field.a":1}).explain();
{
    "cursor" : "BtreeCursor field.a_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "field.a" : [
            [
                1,
                1
            ]
        ]
    }
}

#Query 2
> db.zaid.find({"field.b":1}).explain();
{
    "cursor" : "BasicCursor",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 0,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {

    }
}

#Query 3
> db.zaid.find({"field":{b:1}}).explain();
{
    "cursor" : "BtreeCursor field_1",
    "nscanned" : 0,
    "nscannedObjects" : 0,
    "n" : 0,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "field" : [
            [
                {
                    "b" : 1
                },
                {
                    "b" : 1
                }
            ]
        ]
    }
}

Notice that the second query doesn't have an index on it, even though you indexed the array, but the third query does. Choosing your indexes based on how you intend to query your data is as important as considering whether the index itself is what you need. In Mongo, the structure of your index can and does make a very large difference on the performance of your queries if you aren't careful. I think that explains your first question.

Your second question is a bit more open ended, but I think the answer, again, lies in how you expect to query your data. If you will only ever be interested in matching on values of "fields.a", then you should save room in memory for other indexes which you might need down the road. If, however, you are equally likely to query on any of those items in the array, and you are reasonably certain that the array will no grow infinitely (never index on an array that will potentially grow over time to an unbound size. The index will be unable to index documents once the array reaches 1024 bytes in BSON.), then you should index the full array. An example of this might be a document for a hand of playing cards which contains an array describing each card in a users hand. You can index on this array without fear of overflowing beyond the index size boundary since a hand could never have more than 52 cards.

rdrkt
  • 138
  • 6
  • Does that mean if my array will grow in size, I can still index it with "fields.a"? Or I shouldn't index a growing array in any form? – KevinResoL Jul 26 '16 at 08:08