2

I would like to have an opinion from MongoDB experts as what mongodb query optimization / feature options, I can apply to make read query faster for such large collection of documents (datasets) to be available in one go (find query).

May be someone in the mongodb community has faced similar issue and they have some better thoughts to resolve the same.

I am using MongoDB2.6

The requirement is to fetch all the records in one query since these documents are populated in the excel sheet for users to download the data in excel.

I have a large collection of users somwhere 2000000 documents in the user collection.

User Collection fields:

{
_id:  ObjectId("54866f133345a61b6b9a701"),
appId: "20fabc6",
firstName: "FN",
lastName: "LN",
email: "ln@1.com",
userType: "I",
active: "true",
lastUpdatedBy: "TN",
lastUpdatedDate: ISODate("2013-01-24T05:00:00Z"),  
createdTime: ISODate("2011-01-24T05:00:00Z"),

}

I have a find query which is currently tying to fetch 900000 documents from User collection. It looks like the query get stuck while fetching that much amount of documents.

Below is the find query:

db.test.User.find({"appId": "20fabc6"}).sort({"appId" : 1});

Query Function:

List<Users> findByAppId(Object[] appIds) {
Query query = new Query();
query.addCriteria(Criteria.where("appId").in(appIds));
return mongoTemplate.find(query, Users.class); 
}

I have placed index on the above appId field, still the query is taking too long.

I ran the count on the query and I can see 900000 records for the above find query matching the appId

db.test.User.find({"appId": "20fabc6"}).count();
900000

Below are some of the options, I can think of which can reduce the amount of documents:

1) Adding more fields to filter the records- which is still large amount

db.test.User.find({"appId": "20fabc6"}, "active": "true").count();
700000

2) Limit the no.of records using mongodb limit operation for range queries - which will impact our first requirement to download all the user data in one go into excel sheet.

Does Aggregation with Cursor will help or sharding in cluster will help, if we have to execute the above find query and fetch that much amount of documents (900000) in one go?

I will appreciate for any help or pointers to resolve the same.

Thanks.

Navi
  • 109
  • 1
  • 2
  • 9
  • Do you have any index on "appId"? Do you need all the columns to be shown in excel? How much time is it taking currently to fetch records? – Lalit Agarwal Dec 24 '14 at 16:25
  • Yes, I do have an index on appId. I have looked at the query plan and mongodb is returning the response for query quite faster in 1774ms for > 900000 documents in a collection. I am also using Spring-Data-MongoDB framework on application side to access data from MongoDB and I think there could be a issue with my query as How Spring MongoTemplate is accessing the data. I will debug this more on the application side by adding some break points on Spring-Data-MongoDB query retrieval to get more stats. I will keep everyone posted on the outcome. Thanks – Navi Dec 24 '14 at 19:03
  • `db.test.User.find({"appId": "20fabc6"}).sort({"appId" : 1});` When you have an ascending index on `appId`, the sort() chain is not required. Remove the `sort` chain and simply run `db.test.User.find({"appId": "20fabc6"})`. Store data in Shards and make `appId` the shard key and then aggregate the results with a `$match` stage or do a simple `find`. – BatScream Dec 24 '14 at 20:03
  • @BatScream - Thanks for your response. As per MongoDB docs, Sort on index performs better - http://docs.mongodb.org/manual/tutorial/sort-results-with-indexes/ However I have checked with the query plan and without sort there is minor better performance in ms while returning the results. So, I will stick without sort for now. – Navi Dec 25 '14 at 01:55

1 Answers1

0

Your sort() is unnecessary as you are trying to find only documents with appId of 20fabc6, so why would you then sort by this same appId since it will be the same for all records returned?

Create an index on the appId field

db.test.User.ensureIndex({"appId":1})

And your query should only scan 900000 documents. You can double check this with performance metadata by using the .explain() method on your find.

db.test.User.find({"appId": "20fabc6"}).explain()