10

I am familiar with the best practice of range based pagination on large MongoDB collections, however I am struggling with figuring out how to paginate a collection where the sort value is on a non-unique field.

For example, I have a large collection of users, and there is a field for the number of times they have done something. This field is defintely non-unique, and could have large groups of documents that have the same value.

I would like to return results sorted by that 'numTimesDoneSomething' field.

Here is a sample data set:

{_id: ObjectId("50c480d81ff137e805000003"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000005"), numTimesDoneSomething: 9}
{_id: ObjectId("50c480d81ff137e805000006"), numTimesDoneSomething: 7}
{_id: ObjectId("50c480d81ff137e805000007"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000002"), numTimesDoneSomething: 15}
{_id: ObjectId("50c480d81ff137e805000008"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000009"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000004"), numTimesDoneSomething: 12}
{_id: ObjectId("50c480d81ff137e805000010"), numTimesDoneSomething: 1}
{_id: ObjectId("50c480d81ff137e805000011"), numTimesDoneSomething: 1}

How would I return this data set sorted by 'numTimesDoneSomething' with 2 records per page?

Community
  • 1
  • 1
Adam Duro
  • 882
  • 1
  • 9
  • 21

2 Answers2

6

@cubbuk shows a good example using offset (skip) but you can also mould the query he shows for ranged pagination as well:

db.collection.find().sort({numTimesDoneSomething:-1, _id:1})

Since the _id here will be unique and you are seconding on it you can actually then range by _id and the results, even between two records having numTimesDoneSomething of 12, should be consistent as to whether they should be on one page or the next.

So doing something as simple as

var q = db.collection.find({_id: {$gt: last_id}}).sort({numTimesDoneSomething:-1, _id:1}).limit(2)

Should work quite good for ranged pagination.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • Won't `find({_id: last_id})` only return me a single record? – Adam Duro Jan 10 '13 at 08:41
  • 1
    find query needs to be greater than last_id doesn't it? Your solution also covers the case of new insertions +1 for that =) – cubbuk Jan 10 '13 at 08:41
  • 1
    @AdamDuro Nice spot :) Fixed it now – Sammaye Jan 10 '13 at 08:45
  • 4
    Sorry to necro this answer but this can't be right. The secondary sort on _id doesn't preclude docs with *older* ids from coming *later* in the pagination. If you simply query for $gt the last id you'll throw out all older docs which may have fewer numTimes. For this to work I believe you'd need to query for `$or: [{numtimes: last_numTimes, _id: last_id}, {numTimes: {$lt: last_numTimes}}`, pardon the code paraphrasing. If I'm incorrect, please chastise me. I'm struggling with the problem of non-unique range pagination currently myself and am looking for answers. – numbers1311407 Sep 15 '13 at 23:52
  • @numbers1311407 No this should work, you get the last _id of the sort and get all _ids of the sort after, `last_id` represents the last_id you get from the query results as such it should represent the end of the last page and the start of the next – Sammaye Sep 16 '13 at 07:03
  • 1
    @Sammaye right but the _id is only the secondary sort and only affects the order if the `numX` is *the same*. The results could very well look like `{numX: 2, _id: 2}, {numX: 2, _id: 3}, {numX: 1, _id: 1}`. In which case, if your page ended on the 2nd doc, would result in a "next page" query of `_id > 3`, which would throw out the third result. In a real query you'd throw out a lot more. – numbers1311407 Sep 16 '13 at 12:47
  • @numbers1311407 No because you would sort on a compound of numx and _id which means your first page would be id 1 and 2, it cannot be _id 3, that would be the seocnd page, the secondary _id is only used to stop potential for duplication/row loss for duplicate values for numx – Sammaye Sep 16 '13 at 12:52
  • @numbers1311407 If you was sorting in desc manner then your statement would be partially correct however then your query is actually correct and it should get _id 1 – Sammaye Sep 16 '13 at 12:58
  • @numbers1311407 I have added a bit more here: http://stackoverflow.com/questions/18819076/ranged-pagination-when-querying-sorting-on-dynamic-non-unique-fields-in-mongo/18822752#18822752 hopefully it should explain better – Sammaye Sep 16 '13 at 13:18
  • @numbers1311407 got your solution: http://stackoverflow.com/questions/5525304/how-to-do-pagination-using-range-queries-in-mongodb there is a way of actually limiting the position of the index for the find by using $min and $max – Sammaye Sep 16 '13 at 17:00
  • @Sammaye Yes this is a good find. I read this one before and it could help. Thing is that it would need multiple indexes for every query combination, sorted in the same direction, and probably havoc would ensue if the sort orders were mixed. Still it's something to think about, thanks for looking. – numbers1311407 Sep 16 '13 at 17:20
  • @numbers1311407 Until https://jira.mongodb.org/browse/SERVER-3071 is completed you would have to do that anyway – Sammaye Sep 16 '13 at 17:22
  • For whoever downvoted this answer is quite old and a lot has changed since this answer. It is left here mainly for the comments and not so much the answer itself – Sammaye Jul 21 '14 at 22:16
  • @Sammaye, I know this is an old thread, but this seems to be just about the only result for this specific use case. Do you know of a better method to accomplish this now? – NoobSter Nov 22 '16 at 18:31
  • I do indeed, use $min and $max to actually reduce the index manually and then you can do your query on the index that results from that – Sammaye Nov 23 '16 at 08:11
3

You can sort on multiple fields in this case sort on numTimesDoneSomething and id field. Since id_ field is ascending in itself already according to the insertion timestamp, you will able to paginate through the collection without iterating over duplicate data unless new data is inserted during the iteration.

db.collection.find().sort({numTimesDoneSomething:-1, _id:1}).offset(index).limit(2)
cubbuk
  • 7,800
  • 4
  • 35
  • 62