31

Django has the great new annotate() function for querysets. However I can't get it to work properly for multiple annotations in a single queryset.

For example,

tour_list = Tour.objects.all().annotate( Count('tourcomment') ).annotate( Count('history') )

A tour can contain multiple tourcomment and history entries. I'm trying to get how many comments and history entries exist for this tour. The resulting

history__count and tourcomment__count

values will be incorrect. If there's only one annotate() call the value will be correct.

There seems to be some kind of multiplicative effect coming from the two LEFT OUTER JOINs. For example, if a tour has 3 histories and 3 comments, 9 will be the count value for both. 12 histories + 1 comment = 12 for both values. 1 history + 0 comment = 1 history, 0 comments (this one happens to return the correct values).

The resulting SQL call is:

SELECT `testapp_tour`.`id`, `testapp_tour`.`operator_id`, `testapp_tour`.`name`, `testapp_tour`.`region_id`, `testapp_tour`.`description`, `testapp_tour`.`net_price`, `testapp_tour`.`sales_price`, `testapp_tour`.`enabled`, `testapp_tour`.`num_views`, `testapp_tour`.`create_date`, `testapp_tour`.`modify_date`, `testapp_tour`.`image1`, `testapp_tour`.`image2`, `testapp_tour`.`image3`, `testapp_tour`.`image4`, `testapp_tour`.`notes`, `testapp_tour`.`pickup_time`, `testapp_tour`.`dropoff_time`, COUNT(`testapp_tourcomment`.`id`) AS `tourcomment__count`, COUNT(`testapp_history`.`id`) AS `history__count` 
FROM `testapp_tour` LEFT OUTER JOIN `testapp_tourcomment` ON (`testapp_tour`.`id` = `testapp_tourcomment`.`tour_id`) LEFT OUTER JOIN `testapp_history` ON (`testapp_tour`.`id` = `testapp_history`.`tour_id`)
GROUP BY `testapp_tour`.`id`
ORDER BY `testapp_tour`.`name` ASC

I have tried combining the results from two querysets that contain a single call to annotate (), but it doesn't work right... You can't really guarantee that the order will be the same. and it seems overly complicated and messy so I've been looking for something better...

tour_list = Tour.objects.all().filter(operator__user__exact = request.user ).filter(enabled__exact = True).annotate( Count('tourcomment') )
tour_list_historycount = Tour.objects.all().filter( enabled__exact = True ).annotate( Count('history') )
for i,o in enumerate(tour_list):
    o.history__count = tour_list_historycount[i].history__count

Thanks for any help. Stackoverflow has saved my butt in the past with a lot of already-answered questions, but I wasn't able to find an answer to this one yet.

Tomasz Jakub Rup
  • 10,502
  • 7
  • 48
  • 49
tbak
  • 951
  • 7
  • 7

3 Answers3

64

Thanks for your comment. That didn't quite work but it steered me in the right direction. I was finally able to solve this by adding distinct to both Count() calls:

Count('tourcomment', distinct=True)
tbak
  • 951
  • 7
  • 7
  • 2
    ...which is still a horrible solution, as this merely filters out all duplicates from the huge result – dragonroot Jan 07 '14 at 23:06
  • 1
    This also only works with Count. I'm having a similar issue with a Count and a Sum, and while setting distinct to true keeps the Count accurate, the Sum is still being multiplied – StephenTG Mar 18 '14 at 20:32
  • Also, for a `Sum` bug description see: https://code.djangoproject.com/ticket/10060 – sobolevn Sep 21 '15 at 09:08
  • 1
    Lovely, you don't know the struggle I have been through. Thanks a lot. – chaulap Sep 08 '21 at 17:20
3
tour_list = Tour.objects.all().annotate(tour_count=Count('tourcomment',distinct=True) ).annotate(history_count=Count('history',distinct=True) )

You have to add distinct=True to get the proper result else it will return the wrong answer.

4b0
  • 21,981
  • 30
  • 95
  • 142
Javed Gouri
  • 289
  • 4
  • 5
0

I can't guarantee that this will solve your problem, but try appending .order_by() to your call. That is:

tour_list = Tour.objects.all().annotate(Count('tourcomment')).annotate(Count('history')).order_by()

The reason for this is that django needs to select all the fields in the ORDER BY clause, which causes otherwise identical results to be selected. By appending .order_by(), you're removing the ORDER BY clause altogether, which prevents this from happening. See the aggregation documentation for more information on this issue.

André Eriksson
  • 4,296
  • 2
  • 19
  • 16