6

Problem

I want to use a COUNT(DISTINCT field) with a GROUP BY clause in Django. As I understand, the COUNT(DISTINCT... can only be achieved by using an extra for the query set.

My simplified model is :

class Site(models.Model):
    name = models.CharField(max_length=128, unique=True)

class Application(models.Model):
    name = models.CharField(max_length=64)
    version = models.CharField(max_length=13, db_index=True)

class User(models.Model):
    name = models.CharField(max_length=64) 
    site = models.ForeignKey(Site, db_index=True) 

class Device(models.Model):
    imei = models.CharField(max_length=16, unique=True)

    applications = models.ManyToManyField(Application, null=True, db_index=True, through='ApplicationUsage') 
    user = models.ForeignKey(User, null=True, db_index=True)

class ApplicationUsage(models.Model):
    activity = models.DateField(db_index=True)

    application = models.ForeignKey(Application)
    device = models.ForeignKey(Device)

My goal is to have a liste of Site objects with a count of distinct device for each site given an application activity through a time period, something like

stats_site.name     deviceCount
ALBI                32
AMPLEPUIS           42
...

I try this code :

qs = models.Site.objects.filter(user__device__applicationusage__activity__range=[startDay, endDay])\
                            .extra(select={'deviceCount' : 'COUNT(DISTINCT `stats_device`.`id`)'})\
                            .values('name', 'deviceCount')\

The generated SQL is :

SELECT (COUNT(DISTINCT stats_device.id)) AS deviceCount, stats_site.name
FROM stats_site
INNER JOIN stats_user ON (stats_site.id = stats_user.site_id)
INNER JOIN stats_device ON (stats_user.id = stats_device.user_id)
INNER JOIN stats_applicationusage ON (stats_device.id = stats_applicationusage.device_id)
WHERE stats_applicationusage.activity BETWEEN '2013-07-01' AND '2013-07-03'

And the result is obviously wrong since it lacks the GROUP BY clause, which should be GROUP BY stats_site.name

The problem is: I don't know how to add the correct GROUP BY using the annotate function or other.

Solution

Using distinct=True on the Count function with annotate:

qs = models.Site.objects.filter(habileouser__device__applicationusage__activity__range=[startDay, endDay])\
                            .annotate(deviceCount=Count('habileouser__device', distinct=True))\
                            .values('name', 'deviceCount')
Emmanuel Sys
  • 817
  • 6
  • 12
  • Personally, I'd get them all in one query then use python to count, BUT, Did you look at aggregation? https://docs.djangoproject.com/en/dev/topics/db/aggregation/ – Rob L Aug 26 '13 at 19:42
  • Yeah but aggregate would only give one value for the whole query set, so it won't work. For the count using Python, in the end it may be the solution but only if I don't find better way using only SQL count – Emmanuel Sys Aug 26 '13 at 19:57

1 Answers1

13

The annotate method of a queryset will calculate an aggregate value for each element of the queryset, and when used after a values call will aggregate over the values of the values. I think this should work:

qs = models.Site.objects.filter(
    user__device__applicationusage__activity__range=[startDay, endDay]
).values('name').annotate(Count('user__device', distinct=True))

If you have an ordering specified you may need to remove it as discussed here: https://docs.djangoproject.com/en/dev/topics/db/aggregation/#interaction-with-default-ordering-or-order-by

djvg
  • 11,722
  • 5
  • 72
  • 103
Peter DeGlopper
  • 36,326
  • 7
  • 90
  • 83
  • Thanks for your answer. However your proposed solution count all devices, even duplicated ones. The point is to count the number of distinct devices. To add a `distinct()` won't work since the distinct will also apply to all fields in the select clause, even the count – Emmanuel Sys Aug 27 '13 at 17:53
  • 1
    According to this answer, the `Count` aggregate can take a `distinct=True` argument - I had not known that previously. http://stackoverflow.com/questions/4048014/how-to-add-an-annotation-on-distinct-items – Peter DeGlopper Aug 27 '13 at 18:35