Mongodb db.collection.distinct() on aws documentdb doesn't use index

Question

Transitioning to new AWS documentDB service. Currently, on Mongo 3.2. When I run db.collection.distinct("FIELD_NAME") it returns the results really quickly. I did a database dump to AWS document DB (Mongo 3.6 compatible) and this simple query just gets stuck.

Here's my .explain() and the indexes on the working instance versus AWS documentdb:

Explain function on working instance:

> db.collection.explain().distinct("FIELD_NAME")
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "db.collection",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "$and" : [ ]
                },
                "winningPlan" : {
                        "stage" : "PROJECTION",
                        "transformBy" : {
                                "_id" : 0,
                                "FIELD_NAME" : 1
                        },
                        "inputStage" : {
                                "stage" : "DISTINCT_SCAN",
                                "keyPattern" : {
                                        "FIELD_NAME" : 1
                                },
                                "indexName" : "FIELD_INDEX_NAME",
                                "isMultiKey" : false,
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 1,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "FIELD_NAME" : [
                                                "[MinKey, MaxKey]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },

Explain on AWS documentdb, not working:

rs0:PRIMARY> db.collection.explain().distinct("FIELD_NAME")
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "db.collection",
                "winningPlan" : {
                        "stage" : "AGGREGATE",
                        "inputStage" : {
                                "stage" : "HASH_AGGREGATE",
                                "inputStage" : {
                                        "stage" : "COLLSCAN"
                                }
                        }
                }
        },
}

Index on both of these instances:

        {
                "v" : 1,
                "key" : {
                        "FIELD_NAME" : 1
                },
                "name" : "FIELD_INDEX_NAME",
                "ns" : "db.collection"
        }

Also, this database has a couple million documents but there are only about 20 distinct values for that "FIELD_NAME". Any help would be appreciated.

I tried it with .hint("index_name") and that didn't work. I tried clearing plan cache but I get Feature not supported: planCacheClear

My impression is that while it supports the same query operations, documentdb is architected really differently than mongodb is & that documentdb lacks many of the features that mongodb has, particularly around around aggregation (https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis-aggregation-pipeline.html). I know `distinct` is an "aggregation" command, so perhaps it just doesn't support that well? — klhr, Feb 16 '19 at 01:24
It's probably worth doing a sanity check like `db.collection.findOne({FIELD_NAME: value}).explain()` to make sure that the index got created correctly? — klhr, Feb 16 '19 at 01:25
@willis I think you're on the right track. I'll see if I can figure out some other work around and read more about the aggregation pipeline to do a better query. I tried both: ```db.fetched.explain().findOne({CLASS:"CALA"})``` and ```db.fetched.findOne({CLASS:"CALA"}).explain()```. They both throw an error like: ```[thread1] TypeError: db.fetched.explain(...).findOne is not a function``` ```E QUERY [thread1] TypeError: db.fetched.findOne(...).explain is not a function``` Thanks for pointing me in the right track though. — Bhavik Shah, Feb 17 '19 at 03:54

score 1 · Answer 1 · answered Apr 24 '19 at 08:26

1

COLLSCAN and IXSCAN don't have too much difference in this case, both need to scan all the documents or index entries.

answered Apr 24 '19 at 08:26

Mark

11
2

Mongodb db.collection.distinct() on aws documentdb doesn't use index

1 Answers1