176

Consider simple Django models Event and Participant:

class Event(models.Model):
    title = models.CharField(max_length=100)

class Participant(models.Model):
    event = models.ForeignKey(Event, db_index=True)
    is_paid = models.BooleanField(default=False, db_index=True)

It's easy to annotate events query with total number of participants:

events = Event.objects.all().annotate(participants=models.Count('participant'))

How to annotate with count of participants filtered by is_paid=True?

I need to query all events regardless of number of participants, e.g. I don't need to filter by annotated result. If there are 0 participants, that's ok, I just need 0 in annotated value.

The example from documentation doesn't work here, because it excludes objects from query instead of annotating them with 0.

Update. Django 1.8 has new conditional expressions feature, so now we can do like this:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0,
        output_field=models.IntegerField()
    )))

Update 2. Django 2.0 has new Conditional aggregation feature, see the accepted answer below. This also works in Django 3.x

jperelli
  • 6,988
  • 5
  • 50
  • 85
rudyryk
  • 3,695
  • 2
  • 26
  • 33

6 Answers6

213

Conditional aggregation in Django 2.0+ allows you to further reduce the amount of faff this has been in the past. This will also use Postgres' filter logic, which is somewhat faster than a sum-case (I've seen numbers like 20-30% bandied around).

Anyway, in your case, we're looking at something as simple as:

from django.db.models import Q, Count
events = Event.objects.annotate(
    paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)

There's a separate section in the docs about filtering on annotations. It's the same stuff as conditional aggregation but more like my example above. Either which way, this is a lot healthier than the gnarly subqueries I was doing before.

djvg
  • 11,722
  • 5
  • 72
  • 103
Oli
  • 235,628
  • 64
  • 220
  • 299
  • BTW, there's no such example by the documentation link, only `aggregate` usage is shown. Have you already tested such queries? (I haven't and I want to believe! :) – rudyryk Feb 06 '18 at 18:12
  • 2
    I have. They work. I actually hit a weird patch where an old (super-complicated) subquery stopped working after upgrading to Django 2.0 and I managed to replace it with a super-simple filtered-count. There is a better in-doc example for annotations so I'll pull that in now. – Oli Feb 06 '18 at 21:05
  • 1
    There are a few answers here, this is the Django 2.0 way, and below you will find the Django 1.11 (Subqueries) way, and the Django 1.8 way. – Ryan Castner Apr 29 '18 at 20:05
  • 3
    Beware, if you try this in Django <2, e.g. 1.9, it *will* run without exception, but the filter simply is not applied. So it may appear to work with Django <2, but does not. – djvg Feb 01 '19 at 09:21
  • If you need to add multiple filter you can add them in the Q() argument with separated by , as example filter=Q(participants__is_paid=True, somethingelse=value) – Tobit Apr 01 '20 at 22:53
  • 1
    You can even use RawSQL instead of Q to do something more complicated `.annotate(lines=Count("mm_items", filter=RawSQL("coalesce(qty, sug_qty) > 0", [])))` it will just paste your sql like `FILTER (WHERE (coalesce(qty, sug_qty) > 0))` – Alexander Dec 29 '21 at 12:38
  • This is brillant. Thank you. – Umut Çağdaş Coşkun Nov 07 '22 at 19:50
97

Just discovered that Django 1.8 has new conditional expressions feature, so now we can do like this:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0, output_field=models.IntegerField()
    )))
Dan
  • 12,409
  • 3
  • 50
  • 87
rudyryk
  • 3,695
  • 2
  • 26
  • 33
  • Is this an eligible solution when the matching items are many? Let us say that i want to count click events which occurred the latest week. – SverkerSbrg Jan 17 '17 at 14:51
  • Why not? I mean, why your case is different? In the case above there may by any number of paid participants on event. – rudyryk Apr 08 '17 at 10:20
  • I think the question @SverkerSbrg is asking is whether this is inefficient for large sets, rather than whether or not it would work.... correct? Most important thing to know is that it's not doing it in python, it's creating a SQL case clause - see https://github.com/django/django/blob/master/django/db/models/expressions.py#L831 - so it'll be reasonably performant, simple example would be better than a join, but more complex versions could include subqueries etc. – Hayden Crocker Jul 18 '17 at 12:13
  • 1
    When using this with `Count` (instead of `Sum`) I guess we should set `default=None` (if not using the django 2 `filter` argument). – djvg Feb 01 '19 at 09:08
50

UPDATE

The sub-query approach which I mention is now supported in Django 1.11 via subquery-expressions.

Event.objects.annotate(
    num_paid_participants=Subquery(
        Participant.objects.filter(
            is_paid=True,
            event=OuterRef('pk')
        ).values('event')
        .annotate(cnt=Count('pk'))
        .values('cnt'),
        output_field=models.IntegerField()
    )
)

I prefer this over aggregation (sum+case), because it should be faster and easier to be optimized (with proper indexing).

For older version, the same can be achieved using .extra

Event.objects.extra(select={'num_paid_participants': "\
    SELECT COUNT(*) \
    FROM `myapp_participant` \
    WHERE `myapp_participant`.`is_paid` = 1 AND \
            `myapp_participant`.`event_id` = `myapp_event`.`id`"
})
Todor
  • 15,307
  • 5
  • 55
  • 62
  • Thanks Todor! Seems like I've found the way without using `.extra`, as I prefer to avoid SQL in Django :) I'll update the question. – rudyryk Jun 10 '15 at 10:59
  • 1
    You are welcome, btw I'm aware of this approach, but it was a non-working solution until now, that's why I didn't mention about it. However I just found that it has been fixed in `Django 1.8.2`, so i guess you are with that version and that's why its working for you. You can read more about that [here](http://stackoverflow.com/questions/29440374/django-annotate-and-count-how-to-filter-the-ones-to-include-in-count) and [here](https://code.djangoproject.com/ticket/24766) – Todor Jun 10 '15 at 12:17
  • 2
    I'm getting that this produces a None when it should be 0. Anyone else getting this? – Stefan Collier Feb 11 '18 at 15:07
  • 1
    @StefanJCollier Yes, I got `None` too. My solution was to use `Coalesce` (`from django.db.models.functions import Coalesce`). You use it like this: `Coalesce(Subquery(...), 0)`. There may be a better approach, though. – Adam Taylor Mar 02 '20 at 16:44
  • This is great because the approach in the more upvoted answer by Oli below, whlie "better" in terms of readability, results in "LEFT OUTER JOIN"s on MySQL. Which is very unfriendly in terms of performance. So upvoting both answers! – Joseph Victor Zammit Nov 30 '21 at 14:56
  • That saved my day. Especially the `OuterRef`. Thank you! – Artur Barseghyan Aug 03 '22 at 08:35
6

I would suggest to use the .values method of your Participant queryset instead.

For short, what you want to do is given by:

Participant.objects\
    .filter(is_paid=True)\
    .values('event')\
    .distinct()\
    .annotate(models.Count('id'))

A complete example is as follow:

  1. Create 2 Events:

    event1 = Event.objects.create(title='event1')
    event2 = Event.objects.create(title='event2')
    
  2. Add Participants to them:

    part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))\
              for _ in range(10)]
    part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))\
              for _ in range(50)]
    
  3. Group all Participants by their event field:

    Participant.objects.values('event')
    > <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>
    

    Here distinct is needed:

    Participant.objects.values('event').distinct()
    > <QuerySet [{'event': 1}, {'event': 2}]>
    

    What .values and .distinct are doing here is that they are creating two buckets of Participants grouped by their element event. Note that those buckets contain Participant.

  4. You can then annotate those buckets as they contain the set of original Participant. Here we want to count the number of Participant, this is simply done by counting the ids of the elements in those buckets (since those are Participant):

    Participant.objects\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
    
  5. Finally you want only Participant with a is_paid being True, you may just add a filter in front of the previous expression, and this yield the expression shown above:

    Participant.objects\
        .filter(is_paid=True)\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>
    

The only drawback is that you have to retrieve the Event afterwards as you only have the id from the method above.

Raffi
  • 3,068
  • 31
  • 33
  • 1
    The [aggregation docs](https://docs.djangoproject.com/en/3.1/topics/db/aggregation/#values) also discuss the use of `values()` together with `annotate()`. – djvg Mar 12 '21 at 12:52
3

For Django 3.x just write filter after the annotate:

User.objects.values('user_id')
            .annotate(sudo_field=models.Count('likes'))
            .filter(sudo_field__gt=100)

In above sudo_field is not a model field in User Model and here we are filtering the users who have likes (or xyz) more than 100.

Deepanshu Mehta
  • 1,119
  • 12
  • 9
  • 3
    This does not look like an answer to this question. It involves annotation and filtering, but the original question asks about counting, and filtering the objects that are counted, not filtering based on the resulting count or sum. – Matthijs Kooijman Feb 11 '22 at 18:25
1

What result I am looking for:

  • People (assignee) who have tasks added to a report. - Total Unique count of People
  • People who have tasks added to a report but, for task whose billability is more than 0 only.

In general, I would have to use two different queries:

Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()

But I want both in one query. Hence:

Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))

Result:

<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>
Arindam Roychowdhury
  • 5,927
  • 5
  • 55
  • 63
  • Is this an answer to the question? It looks more like a new question that you then answer yourself, that is somewhat related to the original question (the solution uses some of the same tools), but I'm not sure that this answer adds something new to the existing set of answers... – Matthijs Kooijman Feb 11 '22 at 18:29