0

I'm trying to aggregate values down to a specific level with Django, but it does not seem straighforward : aggregation occurs at the full queryset or at the object level, but not 'in-between'.

Here is my query : I need to compute a total activity time by year from multiple event entries.

events.annotate(                      # Extracting the year
       year = ExtractYear('dtstart')
   ).annotate(                        # Computing duration... 
       time_span= ExpressionWrapper(
               F('dtend') - F('dtstart'), 
               output_field=IntegerField()
           )
   ).values(                          # ... then aggregating by year
       'year', 'time_span'             #!!!! ... BUT ALSO unfortunately
                                      # by 'time_span' !!!!!!!!!!!!!!!
   ).annotate(                        # otherwise 'time_span' would not
       total_span=Sum('time_span')    # be available for a new computation.
   ).values(                          # Therefore, the result is not
       'year', 'total_span'           # grouped by year, but by the
   ).order_by(                        # distinct combination of 'year'
       'year'                         # AND 'total_span'
   )

As a consequence, I get this result :

<QuerySet [
{'total_span': 1800000000, 'year': 2016}, 
{'total_span': 7200000000, 'year': 2016}, 
{'total_span': 2700000000, 'year': 2016}, 
{'total_span': 14400000000, 'year': 2016}, 
{'total_span': 8100000000, 'year': 2017}, ...>

instead of something like this :

<QuerySet [
{'total_span': 16700000000, 'year': 2016}, 
{'total_span': 19800000000, 'year': 2017}, ...>

In order to set the aggregation level, I would need something like :

aggregate(total_span=Sum('time_span'), group_by='year')

I guess I should turn to subquery or custom aggregation class, but this is beyond my skills so far. Any hint would be warmly appreciated...

Atopus
  • 35
  • 7

1 Answers1

0

I'm thinking this might be one of those cases where you're going to go back to basics and write some SQL. ORMs are great, but sometimes its more work to try to use it then to just default back to what works. You probably could get it to work with the ORM, but this is probably much easier.

from django.db import connection

cursor = connection.cursor()
cursor.execute("""
    select
      sum(dtend - dtstart) as total,
      DATE_FORMAT(dtstart, "%Y") as year
    from events
    group by year;""")

for row in cursor.fetchall():
    # Do something with it

Edit: As always, make sure to sanitize any user input that you may put throught raw SQL

Brobin
  • 3,241
  • 2
  • 19
  • 35