0

I have an aggregate on a collection with about 1.6M of registers. That consult is a simple example of other more complex, but illustrate the poor optimization of index used in my opinion.

db.getCollection('cbAlters').runCommand("aggregate", {pipeline: [
{
    $match: { cre_carteraId: "31" }
},
{
    $group: { _id: { ca_tramomora: "$cre_tramoMora" },
            count: { $sum: 1 } }
}
]})

That query toke about 5 sec. The colleccion have 25 indexes configured to differents consults. The one used according to query explain is:

{
        "v" : 1,
        "key" : {
            "cre_carteraId" : 1,
            "cre_periodo" : 1,
            "cre_tramoMora" : 1,
            "cre_inactivo" : 1
        },
        "name" : "cartPerTramInact",
        "ns" : "basedatos.cbAlters"
    },

I created an index adjusted to this particular query:

{
    "v" : 1,
    "key" : {
        "cre_carteraId" : 1,
        "cre_tramoMora" : 1
    },
    "name" : "cartPerTramTest",
    "ns" : "basedatos.cbAlters"
}

The query optimizer reject this index, and suggests me to use the initial index. Output of my query explain seem like this:

{
    "waitedMS" : NumberLong(0),
    "stages" : [ 
        {
            "$cursor" : {
                "query" : {
                    "cre_carteraId" : "31"
                },
                "fields" : {
                    "cre_tramoMora" : 1,
                    "_id" : 0
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "basedatos.cbAlters",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "cre_carteraId" : {
                            "$eq" : "31"
                        }
                    },
                    "winningPlan" : {
                        "stage" : "PROJECTION",
                        "transformBy" : {
                            "cre_tramoMora" : 1,
                            "_id" : 0
                        },
                        "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "cre_carteraId" : 1,
                                "cre_periodo" : 1,
                                "cre_tramoMora" : 1,
                                "cre_inactivo" : 1
                            },
                            "indexName" : "cartPerTramInact",
                            "isMultiKey" : false,
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                "cre_carteraId" : [ 
                                    "[\"31\", \"31\"]"
                                ],
                                "cre_periodo" : [ 
                                    "[MinKey, MaxKey]"
                                ],
                                "cre_tramoMora" : [ 
                                    "[MinKey, MaxKey]"
                                ],
                                "cre_inactivo" : [ 
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        }
                    },
                    "rejectedPlans" : [ 
                        {
                            "stage" : "PROJECTION",
                            "transformBy" : {
                                "cre_tramoMora" : 1,
                                "_id" : 0
                            },
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                    "cre_carteraId" : 1,
                                    "cre_tramoMora" : 1
                                },
                                "indexName" : "cartPerTramTest",
                                "isMultiKey" : false,
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 1,
                                "direction" : "forward",
                                "indexBounds" : {
                                    "cre_carteraId" : [ 
                                        "[\"31\", \"31\"]"
                                    ],
                                    "cre_tramoMora" : [ 
                                        "[MinKey, MaxKey]"
                                    ]
                                }
                            }
                        }
                    ]
                }
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "ca_tramomora" : "$cre_tramoMora"
                },
                "count" : {
                    "$sum" : {
                        "$const" : 1.0
                    }
                }
            }
        }
    ],
    "ok" : 1.0
}

Then, why optimizer prefers an index less adjusted? Should indexFilterSet (result filtered for index) be true for this aggregate?

How can I improve this index, or something goes wrong with the query?

I do not have much experience with mongoDB, I appreciate any help

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Mikhael
  • 1
  • 1
  • 1
    Generally speaking, too much of indexes will spoil the party. Try to just keep the relevant ones. It's like having every other song you have in your playlist. – CS_noob Aug 10 '16 at 20:14
  • Only the initial `$match` can use an index, see http://stackoverflow.com/questions/20170636/mongodb-indexing-for-aggregates – JohnnyHK Aug 11 '16 at 13:07

2 Answers2

0

As long as you have index cartPerTramInact, optimizer won't use your cartPerTramTest index because first fields are same and in same order.

This goes with other indexes too. When there is indexes what have same keys at same order (like a.b.c.d, a.b.d, a.b) and you query use fields a.b, it will favour that a.b.c.d. Anyway you don't need that index a.b because you already have two indexes what covers a.b (a.b.c.d and a.b.d) Index a.b.d is used only when you do query with those fields a.b.d, BUT if a.b is already very selective, it's probably faster to do select with index a.b.c.d using only part a.b and do "full table scan" to find that d

JJussi
  • 1,540
  • 12
  • 12
0

There is a hint option for aggregations that can help with the index...

See https://www.mongodb.com/docs/upcoming/reference/method/db.collection.aggregate/#mongodb-method-db.collection.aggregate

malix
  • 3,566
  • 1
  • 31
  • 41