0

I found this discussion about annotation over querysets. I wonder, if that ever got implemented?

When I run the following annotation:

x = Isolate.objects.all().annotate(sum_shipped=Sum('shipped'))
x[0].sum_shipped
>>> True

shipped is a boolean field, so I would expect here the number of instances where shipped is set to True. Instead, I get True or 1. That is pretty inconvenient behaviour.

There is also this discussion on stackoverflow. However, this only covers django 1.

I would expect something like Sum([True, True, False]) -> 2. Instead, True is returned.

Seems this was not touched since then. Is that discussion in the second link still the state-of-the-art ???

Is there a better way to do statistics with the content of the database than this, now that Django is in version 4?

My current database is sqlite3 for development and testing, but will be Postgres in production. I would very much like to get database independent results.

SUM(bool_field) on Oracle will return 1 also, because bools in Oracle are just a bit (1 or 0). Postgres has a specific bool type, and you can't sum a True/False without an implicit conversion to int. This is why I say SUM(bool) is only accidentally supported for a subset of databases. The field type that is returned is based on the backend get_db_converters and the actual values that come back.


Example model

class Isolate(models.Model):
    isolate_nbr = models.CharField(max_length=10, primary_key=True)
    growing = models.BooleanField(default=True)
    shipped = models.BooleanField(default=True)
    ....
Soerendip
  • 7,684
  • 15
  • 61
  • 128
  • Can you share your model(s)? You should be able to do this with a `Count` with a filter – Iain Shelvington Feb 13 '22 at 05:59
  • Yes, I can do that with a filter and then get the length of the queryset with something like x = len(x), but that smells so badly. That also means, I have to run a filter for all fields that I want to get statistics from. I wonder what is the most elegant way to do this right now. – Soerendip Feb 13 '22 at 06:02
  • PS: using filter and count, btw, does not work. Something like this: `Isolate.objects.filter(shipped=True).annotate(sum_shipped=Count('shipped'))` returns a 1 as well. – Soerendip Feb 13 '22 at 06:07
  • It's not clear what your expected behaviour is, you're not grouping your queryset by anything so your annotation is only using a single row. What is your desired output? A sum/count of shipped rows per (grouped by) ? – Iain Shelvington Feb 13 '22 at 06:09
  • "I would expect here the number of instances where shipped is set to True" What is unclear with that? I would love to do value counts as well, but that was not the question here. Again, I would expect an integer value. Based on my current database ~8000. – Soerendip Feb 13 '22 at 06:12
  • So you just want a count of instances where shipped is True and not a queryset of instances? Your use of `annotate()` is what is confusing, it's meant for applying a function per row, you probably want something like `aggregate()` instead if you want to calculate the sum for the entire queryset – Iain Shelvington Feb 13 '22 at 06:17
  • Does this return what you want `Isolate.objects.filter(shipped=True).count()`? – Iain Shelvington Feb 13 '22 at 06:19
  • Arg, your eagle eye spotted it! I was using the wrong function. Somewhere, I mixed up `annotate` and `aggregate`. – Soerendip Feb 13 '22 at 06:22
  • Ok, I see that is from the docs: https://docs.djangoproject.com/en/4.0/topics/db/aggregation/ they also use both functions there. – Soerendip Feb 13 '22 at 06:27
  • An aggregation similar to this should work I think `Isolate.objects.aggregate(total_shipped=Count('pk', filter=Q(shipped=True)))` – Iain Shelvington Feb 13 '22 at 06:29
  • Even this works: `x = Isolate.objects.all().aggregate(sum_shipped=Sum('shipped'))` – Soerendip Feb 13 '22 at 06:33

1 Answers1

1

I think you have a few options here, an annotate is not one of them at this time.

from django.db.models import Sum, Count

# Use aggregate with a filter
print(Isolate.objects.filter(shipped=True).aggregate(Sum('shipped')))

# Just filter then get the count of the queryset
print(Isolate.objects.filter(shipped=True).count())

# Just use aggregate without filter (this will only aggregate/Sum the True values)
print(Isolate.objects.aggregate(Sum('shipped')))

# WON'T WORK: annotate will only annotate on that row whatever that row's value is, not the aggregate across the table
print(Isolate.objects.annotate(sum_shipped==Count('shipped')).first().sum_shipped)
Dharman
  • 30,962
  • 25
  • 85
  • 135
Artisan
  • 1,974
  • 1
  • 18
  • 23
  • Yes, I had figured out, in the mean time, that I mixed up annotate and aggregate. Thank you, that is a great summary. – Soerendip Feb 14 '22 at 01:08