6

I have a collection of documents with a multikey index defined. However, the performance of the query is pretty poor for just 43K documents. Is ~215ms for this query considered poor? Did I define the index correctly if nscanned is 43902 (which equals the total documents in the collection)?

Document:

{
    "_id": {
        "$oid": "50f7c95b31e4920008dc75dc"
    },
    "bank_accounts": [
        {
            "bank_id": {
                "$oid": "50f7c95a31e4920009b5fc5d"
            },
            "account_id": [
                "ff39089358c1e7bcb880d093e70eafdd",
                "adaec507c755d6e6cf2984a5a897f1e2"
            ]
        }
    ],
    "created_date": "2013,01,17,09,50,19,274089",
}

Index:

{ "bank_accounts.bank_id" : 1 , "bank_accounts.account_id" : 1}

Query:

db.visitor.find({ "bank_accounts.account_id" : "ff39089358c1e7bcb880d093e70eafdd" , "bank_accounts.bank_id" : ObjectId("50f7c95a31e4920009b5fc5d")}).explain()

Explain:

{
    "cursor" : "BtreeCursor bank_accounts.bank_id_1_bank_accounts.account_id_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 43902,
    "nscanned" : 43902,
    "nscannedObjectsAllPlans" : 43902,
    "nscannedAllPlans" : 43902,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 213,
    "indexBounds" : {
        "bank_accounts.bank_id" : [
            [
                ObjectId("50f7c95a31e4920009b5fc5d"),
                ObjectId("50f7c95a31e4920009b5fc5d")
            ]
        ],
        "bank_accounts.account_id" : [
            [
                {
                    "$minElement" : 1
                },
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "Not_Important"
}
Community
  • 1
  • 1
Jason
  • 4,232
  • 3
  • 23
  • 31
  • One idea: Try flipping the order of parameters to the find to match the order specified by the index. Also, if you just do a find for bank_id, does that use the index? It should, given the order specified in ensureIndex. – WiredPrairie Feb 17 '13 at 12:09
  • @WiredPrairie: Adding a second index matching the order the find parameters didn't help. And yes, if I just do a find for bank_id, an index is used (but still scans 43K documents). – Jason Feb 17 '13 at 19:23
  • Could you just create a single index `{ "bank_accounts.bank_id" : 1 }` and try just the bank_id? It's very strange that it's not using the index (you can delete the index after the test). – WiredPrairie Feb 17 '13 at 19:42
  • I just created a simple test locally with 1000 documents that mirror the structure you have above, and did the queries you've shown above. The search scans 2 documents and returns 1 match. Are you using the most recent builds of mongodb? – WiredPrairie Feb 17 '13 at 19:57
  • @WiredPrairie: We are on v2.2.2. – Jason Feb 17 '13 at 21:16
  • @WiredPrairie: Adding the index { "bank_accounts.bank_id" : 1 } doesn't make find() use that index. Explain() still shows the compound index being used for find( {"bank_accounts.bank_id":ObjectId("50f7c95a31e4920009b5fc5d")}). So I removed the original compound index leaving just the one you suggested. Explain() still shows all documents being scanned (using the single element index). – Jason Feb 17 '13 at 21:19
  • is there any other detail you could provide that might be helpful? I couldn't replicate the problem you're seeing. Try building a new collection one by one and see if the problem persists even in a new collection. – WiredPrairie Feb 17 '13 at 21:27
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/24675/discussion-between-jason-and-wiredprairie) – Jason Feb 17 '13 at 22:35

1 Answers1

8

I see three factors in play.

First, for application purposes, make sure that $elemMatch isn't a more appropriate query for this use-case. http://docs.mongodb.org/manual/reference/operator/elemMatch/. It seems like it would be bad if the wrong results came back due to multiple subdocuments satisfying the query.

Second, I imagine the high nscanned value can be accounted for by querying on each of the field values independently. .find({ bank_accounts.bank_id: X }) vs. .find({"bank_accounts.account_id": Y}). You may see that nscanned for the full query is about equal to nscanned of the largest subquery. If the index key were being evaluated fully as a range, this would not be expected, but...

Third, the { "bank_accounts.account_id" : [[{"$minElement" : 1},{"$maxElement" : 1}]] } clause of the explain plan shows that no range is being applied to this portion of the key.

Not really sure why, but I suspect it has something to do with account_id's nature (an array within a subdocument within an array). 200ms seems about right for an nscanned that high.

A more performant document organization might be to denormalize the account_id -> bank_id relationship within the subdocument, and store:

{"bank_accounts": [
{
 "bank_id": X,
 "account_id: Y,
},
{
 "bank_id": X,
 "account_id: Z,
}
]}

instead of: {"bank_accounts": [{ "bank_id": X, "account_id: [Y, Z], }]}

My tests below show that with this organization, the query optimizer gets back to work and exerts a range on both keys:

> db.accounts.insert({"something": true, "blah": [{ a: "1", b: "2"} ] })
> db.accounts.ensureIndex({"blah.a": 1, "blah.b": 1})
> db.accounts.find({"blah.a": 1, "blah.b": "A RANGE"}).explain()
{
    "cursor" : "BtreeCursor blah.a_1_blah.b_1",
    "isMultiKey" : false,
    "n" : 0,
    "nscannedObjects" : 0,
    "nscanned" : 0,
    "nscannedObjectsAllPlans" : 0,
    "nscannedAllPlans" : 0,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "blah.a" : [
            [
                1,
                1
            ]
        ],
        "blah.b" : [
            [
                "A RANGE",
                "A RANGE"
        ]
    ]
    }
}
Eric
  • 450
  • 2
  • 6
  • Adding to the above, this is what Mongo's FAQs suggests as well - http://docs.mongodb.org/manual/faq/indexes/#how-can-i-effectively-use-indexes-strategy-for-attribute-lookups – epicwhale Feb 15 '14 at 03:44