3

We run Postgres 9.6.5 and Django 2.0. We have a Model with fields created_at and value. We need to calculate a 90-day moving average for a certain date_range. This is how we do this:

output = []

for i in range(len(date_range)):
    output.append(
        Model.objects.filter(
            created_at__date__range=(date_range[i]-timezone.timedelta(days=90), date_range[i]),
        ).aggregate(Avg('value'))['value__avg'].days
    )

This uses Avg aggregate function, so it's reasonably fast, however we need one query for every date in date_range. For longer ranges this means a lot of queries.

Postgres can do this in a single query. My question is - can we somehow do this in a single query using Django ORM?

(I know that I can execute raw SQL with Django ORM, but I wanted to avoid this if possible, which is why I'm asking.)

petr
  • 1,099
  • 1
  • 10
  • 23

3 Answers3

3

Assuming you have a single entry per date you can use Django 2.0's new window expressions to calculate a 90-period moving average in a single query:

from django.db.models import Avg, F, RowRange, Window

items = Model.objects.annotate(
    avg=Window(
        expression=Avg('value'), 
        order_by=F('created_at').asc(), 
        frame=RowRange(start=-90,end=0)
    )
)

See also ValueRange if you would like to frame by specific field values instead, which might come in handy if you have multiple rows for each given day for example.

minkwe
  • 501
  • 5
  • 6
1

Instead of aggregation, you could use annotation. Consider this as I start for testing, I'm not completely sure about the code below. See also the docs about F() objects

    Model.objects.annotate(
        value_avg=Avg(
            'value',
            filter=Q(
                created_at__date__range=(
                    F('created_at__date')-timezone.timedelta(days=90),
                    F('created_at__date')
                )
            )
        )
    )

your_date_field depends what you

ascripter
  • 5,665
  • 12
  • 45
  • 68
  • Thanks. I can't see the logic in your solution. I need a moving average for every day in the date range. Why would it help if I had it for a model instance? – petr Feb 14 '18 at 18:23
  • I thought each date was associated with your model. But if it's independent, my approach doesn't work indeed. – ascripter Feb 14 '18 at 18:41
  • It is, but it's one to many relation. One date can have multiple model instances. Sorry about not clarifying that in the question. – petr Feb 14 '18 at 18:44
1

Another try. This is more performant in that it only uses one query, but fetches all required Model-instances from the DB to do the logic in python instead of the DB level. Still not optimal, but hopefully it does the right thing this time ;) You have to compare if it really gives a performance improvement in your case.

import numpy as np
instances =  Model.objects.filter(
        created_at__gte=min(date_range)-timezone.timedelta(days=90),
        created_at__lte=max(date_range)
    ).values('created_at', 'value')

instances = list(instances)  # evaluate QuerySet and hit DB only once

output = []
for i in range(len(date_range)):    
    output.append(
        np.mean(np.array([inst.value for inst in instances if \
            inst.created_at >= date_range[i]-timezone.timedelta(days=90) and \
            inst.created_at <  date_range[i]
        ]))
    )
ascripter
  • 5,665
  • 12
  • 45
  • 68