Range query with no dups

Question

I have a collection that I would like to serve out as 'pages'. The collection could get quite large, I have read skip is not optimal in that case. I think range queries will work just fine in my case so I am going to try that route.

My collection will be sorted and paged on a timestamp field. I have implemented the API such that a user passes in a startDate and I will return a certain number ('limit', max of 1000) of items. However I am struggling with how to not get duplicates on each page if documents have the same time.

As an example (small page size to make it easy). I have 6 documents let's docs 3 and 4 have the same time. If I ask for page one I will get the first three. However when I ask for page 2 with a startDate that it 'gte' the last doc on page one I will get a dup on page 2 as the last doc from page one will be that same as the first doc on page 2.

I cannot find a range query example anywhere that deals with dates, while not returning dups.

It does seem unlikely that truly "timestamped" documents are ever going to collide on the same date, unless your entries are extremely frequent. But as long as everything is in insertion order then you can always range on the `ObjectId` from the `_id` field — Neil Lunn, May 13 '14 at 05:44
No guarantees on insertion order, which is why I use a separate timestamp field. Insertions are frequent which is why I asked the question. I do have doc that are at the same time up to the millisecond. I need to order on the timestamp field but probably also need to do something on my _id field. Which I would suppose means my client would need to pass in the _id of the last item on each page to get the next page. Still not sure how to effectively not get dups on next page — lostintranslation, May 13 '14 at 14:26
That actually would have been useful information in your question as it does put your use case into context. I'm still not sure on what you say about "insertion order" though unless you are possibly adding documents with "back dated" timestamps or otherwise working through a feeding event queue that does that. Each record you add should be the "newest" unless you can state a case that is otherwise. Better to add your question rather than comment. — Neil Lunn, May 13 '14 at 14:43

Range query with no dups

0 Answers0