I would like to have an opinion from MongoDB experts as what mongodb query optimization / feature options, I can apply to make read query faster for such large collection of documents (datasets) to be available in one go (find query).
May be someone in the mongodb community has faced similar issue and they have some better thoughts to resolve the same.
I am using MongoDB2.6
The requirement is to fetch all the records in one query since these documents are populated in the excel sheet for users to download the data in excel.
I have a large collection of users somwhere 2000000 documents in the user collection.
User Collection fields:
{
_id: ObjectId("54866f133345a61b6b9a701"),
appId: "20fabc6",
firstName: "FN",
lastName: "LN",
email: "ln@1.com",
userType: "I",
active: "true",
lastUpdatedBy: "TN",
lastUpdatedDate: ISODate("2013-01-24T05:00:00Z"),
createdTime: ISODate("2011-01-24T05:00:00Z"),
}
I have a find query which is currently tying to fetch 900000 documents from User collection. It looks like the query get stuck while fetching that much amount of documents.
Below is the find query:
db.test.User.find({"appId": "20fabc6"}).sort({"appId" : 1});
Query Function:
List<Users> findByAppId(Object[] appIds) {
Query query = new Query();
query.addCriteria(Criteria.where("appId").in(appIds));
return mongoTemplate.find(query, Users.class);
}
I have placed index on the above appId field, still the query is taking too long.
I ran the count on the query and I can see 900000 records for the above find query matching the appId
db.test.User.find({"appId": "20fabc6"}).count();
900000
Below are some of the options, I can think of which can reduce the amount of documents:
1) Adding more fields to filter the records- which is still large amount
db.test.User.find({"appId": "20fabc6"}, "active": "true").count();
700000
2) Limit the no.of records using mongodb limit operation for range queries - which will impact our first requirement to download all the user data in one go into excel sheet.
Does Aggregation with Cursor will help or sharding in cluster will help, if we have to execute the above find query and fetch that much amount of documents (900000) in one go?
I will appreciate for any help or pointers to resolve the same.
Thanks.