16

I'm using Django 1.7.7.

I'm wondering if anyone has experienced this. This is my query:

events = Event.objects.filter(
    Q(date__gt=my_date) | Q(date__isnull=True)
).filter(type__in=[...]).order_by('date')

When I try to then paginate it

p = Paginator(events, 10)
p.count  # Gives 91

event_ids = []
for i in xrange(1, p.count / 10 + 2):
    event_ids += [i.id for i in p.page(i)]

print len(event_ids)  # Still 91
print len(set(event_ids))  # 75

I noticed that if I removed the .order_by, I don't get any duplicates. I then tried just .order_by with Event.objects.all().order_by('date') which gave no duplicates.

Finally, I tried this:

events = Event.objects.filter(
    Q(date__gt=my_date) | Q(date__isnull=True)
).order_by('date')

p = Paginator(events, 10)
events.count()  # Gives 131
p.count  # Gives 131

event_ids = []
for i in xrange(1, p.count / 10 + 2):
    event_ids += [i.id for i in p.page(i)]

len(event_ids)  # Gives 131
len(set(event_ids))  # Gives 118

... and there are duplicates. Can anyone explain what's going on?

I dug into the Django source (https://github.com/django/django/blob/master/django/core/paginator.py#L46-L55) and it seems to be something to do with how Django slices the object_list.

Any help is appreciated. Thanks.

Edit: distinct() has no affect on the duplicates. There aren't any duplicates in the database and I don't think the query introduces any duplicates ([e for e in events.iterator()] doesn't produce any duplicates). It's just when the Paginator is slicing.

Edit2: Here's a more complete example

In [1]: from django.core.paginator import Paginator

In [2]: from datetime import datetime, timedelta

In [3]: my_date = timezone.now()

In [4]:   1 events = Event.objects.filter(
          2     Q(date__gt=my_date) | Q(date__isnull=True)
          3 ).order_by('date')

In [5]: events.count()
Out[5]: 134

In [6]: p = Paginator(events, 10)

In [7]: p.count
Out[7]: 134

In [8]: event_ids = []

In [9]:   1 for i in xrange(1, p.num_pages + 1):
          2     event_ids += [j.id for j in p.page(i)]

In [10]: len(event_ids)
Out[10]: 134

In [11]: len(set(event_ids))
Out[11]: 115
davidvuong
  • 175
  • 1
  • 7
  • 3
    Have you tried [distinct()](https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.distinct) method? – Gocht Jul 31 '15 at 01:32
  • No I haven't but I don't think distinct would help. The results from the queryset aren't duplicated. I think it's the way they're pulled out from the queryset (i.e. slicing). Running `len(set([i.id for i in events])) == events.count()` yields `True`. – davidvuong Jul 31 '15 at 01:38
  • I couldn't understand your for-loop's point. Cannot you just use `event_ids = [i.id for i in events]` ? – malisit Jul 31 '15 at 01:48
  • @paradoksumsu: It's just an example. I'm just trying to demonstrate that calling `paginator.page(n)` gives duplicate results. – davidvuong Jul 31 '15 at 04:13
  • When you use this pagination in template, you still get duplications? – malisit Jul 31 '15 at 05:35
  • I'm not sure. I'm not using Django templates, I'm using Django REST framework's list mixin https://github.com/tomchristie/django-rest-framework/blob/version-2.4.x/rest_framework/mixins.py#L69-L98 – davidvuong Jul 31 '15 at 06:36
  • I see. I cannot help with that. Good luck! – malisit Jul 31 '15 at 23:58
  • I would like to see `events.count()` before you create the paginator since you claim the pagination creates duplicates. – James Lin Aug 03 '15 at 22:40
  • just to be sure, there's nothing special about your model right? "id" is the standard pk unique field? – toad013 Aug 04 '15 at 04:21
  • @JamesLin I edited my question (ref Edit2) – davidvuong Aug 04 '15 at 05:19
  • @toad013 Yep. There's nothing special about the id field. In addition, the `date` field is defined as `models.DateTimeField(blank=True, null=True)` in my `models.py` – davidvuong Aug 04 '15 at 05:19

2 Answers2

25

oh, shot in the dark, but i think i might know what it is. i wasn't able to reproduce it in sqlite but using mysql. i think mysql trying to sort on a column that has the same value has it returning the same results during slicing

the pagination splicing basically does an sql statement of SELECT ... FROM ... WHERE (date > D OR date IS NULL) ORDER BY date ASC LIMIT X OFFSET X

But when date is null I'm not sure how mysql sorts it. So when I tried two sql queries of LIMIT 10 and LIMIT 10 OFFSET 10 it returned sets that had the same rows, while LIMIT 20 produce a unique set.

you can try to update your order_by to order_by('id', 'date') to have it sort by a unique field first and it may fix it.

toad013
  • 1,350
  • 9
  • 10
  • 2
    It looks like the same thing might be happening on postgres. I've updated `order_by` to include an `id` and there aren't anymore duplicates when the Paginator slices the queryset. Thanks so much! But I'd really like to understand how all of this works. So to clarify, it's due to the combination of sorting with NULLs and LIMIT + OFFSET that's causing the weirdness. I found this SO response helpful and somewhat related http://stackoverflow.com/questions/9401314/postgresql-odd-offset-limit-behavior-records-order (I'd give you an thumbs up but I don't meet the minimum requirements, I'm afraid). – davidvuong Aug 04 '15 at 05:43
6

Try to use .distinct() on your query before passing it to Paginator.