0

I have 19 Million records in my mongo collection. Format of my collection is:

{
    "_id" : ObjectId("5992d5a5e7f31a5e90abb881"),
    "_class" : "com.abc.Try",
    "field1_code" : "mycode_sdsvmnsbd7986fskljfnsv89s7fmnslfsd78",
    "field2_id" : "5992d5a5e7f31a5e90abb87e",
    "field3_id" : NumberLong(1681703),
    "field4_id" : NumberLong(40119344),
    "field5_create_date" : ISODate("2015-05-15T09:17:46.000Z"),
    "field6_update_date" : ISODate("2015-05-31T08:53:59.000Z"),
    "field7_status" : "active",
    "field8_status" : "active"
}

I am using Spring batch : org.springframework.batch.item.ItemReader<MyCollection>

to read my collection but its reading is quite slow. So I created a new index using command:

db.monitoring_profile.createIndex({ field7_status: 1, field8_status: 1, field4_id: 1})

Still its slow. Please suggest. I believe its an issue of mongodb indexing.

My Mongo Query is :

{'field7_status': 'active', 'field8_status' : 'active', 'field4_id': { $in: [.... 10000 values....] }}

My Explain query result is:

db.coll.find({'field7_status': 'active', 'field8_status' : 'active', 'field4_id': { $in: [40009361, 40006884] }}).explain()

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "mycollection.coll",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [ 
                {
                    "field7_status" : {
                        "$eq" : "active"
                    }
                }, 
                {
                    "field8_status" : {
                        "$eq" : "active"
                    }
                }, 
                {
                    "field4_id" : {
                        "$in" : [ 
                            40006884.0, 
                            40009361.0
                        ]
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "EOF"
        },
        "rejectedPlans" : []
    },
    "serverInfo" : {
        "host" : ".......",
        "port" : 27017,
        "version" : "3.4.2",
        "gitVersion" : "........"
    },
    "ok" : 1.0
}
Shashank
  • 712
  • 15
  • 33
  • 1
    10,000 values as an argument to `$in`. Why? Why do you need a query with 10,000 or more results? What is the purpose? You cannot just say "this is slow" without qualifying what it is that you actually need to do. We can break this down to you are either taking completely the wrong approach or you simply have not developed the right architecture for solving the actual problem – Neil Lunn Aug 22 '17 at 06:24
  • I even tried with 1000 and 100. Its still slow. I don't have any other alternative to pass values in $in. Please suggest if you have any (as there are no joins in mongo). – Shashank Aug 22 '17 at 06:34
  • 1
    [`$lookup`](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/) looks pretty much like a "join" to me ( and not exactly a new thing either ). And if `'field_4` is actually the most limiting property then it should be the very first item in the compound index, and not the "third". – Neil Lunn Aug 22 '17 at 06:40
  • could you please report how does db.coll.find([QUERY]).explain() look like? – Minh Tuan Nguyen Aug 22 '17 at 06:42
  • @minhtuannguyen updated in description. – Shashank Aug 22 '17 at 07:03
  • @NeilLunn data (ids) in $in is not coming from another collection. Its coming from mysql database. So I think $lookup wont work in this case. Please suggest – Shashank Aug 22 '17 at 07:04
  • @Shashank maybe it will not resolve the issue, but did you try to use $hint in your query and force to use a specific index? https://docs.mongodb.com/manual/reference/operator/meta/hint/ – Daniele Tassone Aug 22 '17 at 08:06

2 Answers2

0

The winningPlan of the Explain looks very strange. Maybe you could try to use $and instead of $in to define a range. For example the range (20, 60):

db.test.find({'field7_status': 'active', 'field8_status' : 'active', $and : [{'field4_id' : {$gt : 20}, }, {'field4_id' : {$lt : 60}, }] })
Minh Tuan Nguyen
  • 1,026
  • 8
  • 13
  • Thanks but the input values for field4_id is random and not coming from mongodb, its coming from some other source. And if I will apply range then lot of values will be picked which I don't even want to process. – Shashank Aug 24 '17 at 02:50
0

Thanks everyone Issue is resolved. Actually problem was not with indexing, as @Neil suggests $in was the culprit over there. I replaced $in with $lookup (see MongoDb Post formatting of $lookup aggregation with limited data) and I am getting the results. Thanks @neil and others.

Shashank
  • 712
  • 15
  • 33