1

I have the following models:

class Institution(models.Model):
    pass

class Headquarter(models.Model):
    institution = models.ForeignKey(Institution, related_name='headquarters')

class Audit(models.Model):
    headquarter = models.ForeignKey(Headquarter, related_name='audits')

And the following query (basically, if an institution has at least one audit then has_visits must be true:

Institution.objects.annotate(
    has_visits=models.Case(
        models.When(headquarters__audits=None, then=False),
        default=True,
        output_field=models.BooleanField()
    )
 )

The problem is that if an institution has 2 audits then the queryset returns duplicate rows. I imagine it has something to do with the joins at the SQL level but I'm not sure how to correct it. I found this answer but I don't think OuterRef is what I'm looking for in my situation. What's the right way to accomplish this?

Johnny Beltran
  • 701
  • 2
  • 8
  • 22

2 Answers2

3

You can work with an Exists subquery [Django-doc]:

from django.db.models import Exists, OuterRef

Institution.objects.annotate(
    has_visits=Exists(
        Audit.objects.filter(headquarter__institution=OuterRef('pk'))
    )
)

Something else that could help is to let the duplicates "collapse", for example with a Max:

from django.db.models import Max, Value
from django.db.models.functions import Coalesce

Institution.objects.annotate(
    has_visits=Coalesce(Max(models.Case(
        models.When(headquarters__audits=None, then=False),
        default=True,
        output_field=models.BooleanField()
    )), Value(False))
)

but that likely makes it less readable and less efficient.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
0

For some extra context: I think this is related to a known and documented limitation (though the docs are easy to miss). The docs on aggreation have this section (copied from Django 4.2 docs):

Combining multiple aggregations

Combining multiple aggregations with annotate() will yield the wrong results because joins are used instead of subqueries:

>>> book = Book.objects.first()
>>> book.authors.count()
2
>>> book.store_set.count()
3
>>> q = Book.objects.annotate(Count('authors'), Count('store'))
>>> q[0].authors__count
6
>>> q[0].store__count
6

For most aggregates, there is no way to avoid this problem, however, the Count aggregate has a distinct parameter that may help:

>>> q = Book.objects.annotate(Count('authors', distinct=True), Count('store', distinct=True))
>>> q[0].authors__count
2
>>> q[0].store__count
3

This documentation is about aggregates that produce joins in an annotation, but I suspect essentially the same happens for annotations that produce a join by themselves, like the OP's example.

Another way to circumvent the limitation (though it might not be applicable to the OP's problem) is to put the aggregate in a subquery. I previously explained how to use the django-sql-utils package for this:

For anyone else running into this, a reasonable workaround seems to be to use subqueries for aggregating annotations. This is a bit verbose/hacky in Django currently, as shown by the multitude of approaches in the stackoverflow link from Antoine's comment. However, I've successfully used the django-sql-utils package for this just now. That sounds a bit bulky, but it just has two utilities, one of which is a SubqueryAggregate class (with derived SubqueryCount, SubquerySum, etc.) that make converting a regular joining aggregate into a subquery aggregate easy and concise, without changing the structure much.

For example taking the example from comment 66 and converting the second annotation to a subquery with django-sql-utils, you'd get:

Branch.objects.annotate(
    total=Sum('center__client__loan__amount'),
    repaid=SubquerySum('center__client__loan__payment_schedule__payments__principal'),
)

There is also a bug report tracking this limitation. The OP's code is a bit more like this bug, which was closed as a duplicate of the first one.

Matthijs Kooijman
  • 2,498
  • 23
  • 30