10

I have a basic model like:

class Stats(models.Model):

   created = models.DateTimeField(auto_now_add=True)
   growth = models.IntegerField()

I run a celery job every 10 minutes to create a new stats object.

Using .latest() on a QuerySet gives me the latest Stats object to date.

However, I want a list with one Stats object for each day.

Consider the following:

Stats(growth=100) #created 1/1/13 23:50
Stats(growth=200) #created 1/1/13 23:59
Stats(growth=111) #created 1/2/13 23:50
Stats(growth=222) #created 1/2/13 23:59

The QuerySet should return the latest for each day. In the example the one with 200 and 222 growth.

In SQL I'd fire up a subquery with the max for each day and join it together.

Since I don't want to use raw SQL, is there a way to do this with the django ORM?

Jay
  • 2,519
  • 5
  • 25
  • 42
  • 1
    Just to get this clear in my head; if you want the latest for each day - in your example would you not want the 200 and 222 growth? – Ewan Jun 20 '13 at 12:26
  • yep, that's right. I corrected it ;) – Jay Jun 20 '13 at 12:38
  • See https://stackoverflow.com/questions/43775102/annotate-a-queryset-with-the-latest-value-from-a-filtered-join – Risadinha Sep 18 '18 at 11:25

4 Answers4

4

Unfortunately there's no way (that I'm aware of.. I looked pretty hard) to avoid using some kind of raw sql to accomplish what you want to do (with your current model; see the very end for another suggestion). But you can minimise the impact by writing as little raw sql as possible. In practise django sites do not need to be portable across different databases. Unless you plan on using this app elsewhere or releasing it publicly, you should be ok.

The below example is for sqlite. You could keep a mapping of database types to date functions, look up the type of driver, and replace the function with the correct one if you needed to.

>>> for stat in Stats.objects.all():
...     print stat.created, stat.growth
...
2013-06-22 13:41:25.334262+00:00 3
2013-06-22 13:41:40.473373+00:00 3
2013-06-22 13:41:44.921247+00:00 4
2013-06-22 13:41:47.533102+00:00 5
2013-06-23 13:41:58.458250+00:00 6
2013-06-23 13:42:01.282702+00:00 3
2013-06-23 13:42:03.633236+00:00 1

>>> last_stat_per_day = Stats.objects.extra( 
            select={'the_date': 'date(created)' }
        ).values_list('the_date').annotate(max_date=Max('created'))

>>> last_stat_per_day
[(u'2013-06-22', datetime.datetime(2013, 6, 22, 13, 41, 47, 533102, tzinfo=<UTC>)), (u'2013-06-23', datetime.datetime(2013, 6, 23, 13, 42, 3, 633236, tzinfo=<UTC>))]

>>> max_dates = [item[1] for item in last_stat_per_day]
>>> max_dates
[datetime.datetime(2013, 6, 22, 13, 41, 47, 533102, tzinfo=<UTC>), 
 datetime.datetime(2013, 6, 23, 13, 42, 3, 633236, tzinfo=<UTC>)]

>>> stats = Stats.objects.filter(created__in=max_dates)
>>> for stat in stats:
...     print stat.created, stat.growth
...
2013-06-22 13:41:47.533102+00:00 5
2013-06-23 13:42:03.633236+00:00 1

I had written here before that this was only a single query but I lied - the values_list needs to be transformed to only return the max_date for the successive query, which means running the statement. It's only 2 queries though, which would be significantly better than an N+1 function.

The non-portable bit is this:

last_stat_per_day = Stats.objects.extra( 
    select={'the_date': 'date(created)' }
).values_list('the_date').annotate(max_date=Max('created'))

Using extra isn't ideal, but the raw sql here is simple, and lends itself nicely to a database driver dependent replacement. Only the date(created) needs to be replaced. You can wrap this up in a method on a custom manager if you like, and you've then successfully abstracted this mess away in a single location.

The other option is to just add a DateField to your model, and then you don't need to use extra at all. You'd simply replace the values_list call with a values_list('created_date'), remove the extra completely, and call it a day. The cost is obvious - more storage space required. It's also non-intuitive as to why you have a Date and a DateTime field on the same model. Keeping the two in sync may also pose problems.

Josh Smeaton
  • 47,939
  • 24
  • 129
  • 164
4

TruncDate ist new in Django >2.0 and its now possible to make the same query shorter, but just in databases with distinct support like PostgreSQL.

Stats.objects.all().annotate(date=TruncDay('created')).distinct('created').order_by('-date')

YYYY-MM-DD
  • 4,411
  • 1
  • 15
  • 14
0

Maybe you can do somehting like:

import datetime
day = datetime.datetime.now().day
the_last_one = Stats.objects.filter(created__day=day).order_by('-created')[0]

or Something like

the_last_one = Stats.objects.filter(created__day=day).order_by('created').latest()
Victor Castillo Torres
  • 10,581
  • 7
  • 40
  • 50
  • they'll return the latest Stats object, not a list of objects with the latest stat for each day. – Jay Jun 21 '13 at 07:00
0

On top of the other two answers, maybe also consider storing the results in another model (especially if the data per day doesn't change much after entering and you have large amounts of data). Something like:

class DailyStat(models.Model):
    date = models.DateField(unique=True)
    # Denormalisation yo
    # Could also store foreign keys to Stats instances if needed
    max_growth = models.IntegerField()
    min_growth = models.IntegerField()
    # .
    # .
    # .
    # and any other stats per day e.g. average per day

And add a periodic Celery task:

from celery.task.schedules import crontab
from celery.task import periodic_task
import datetime

# Periodic task for 1am daily
@periodic_task(run_every=crontab(minute=0, hour=1))
def process_stats_ery_day():
    # Code to populate DailyStat
    today = datetime.date.today()
    # Assumes relevant custom Manager methods exist
    # Can use regular Django ORM methods to achieve this
    max = Stats.objects.get_max_growth(date=today)
    min = Stats.objects.get_min_growth(date=today)
    ds = DailyStat(date=today, max_growth=max.growth, min_growth=min.growth)
    ds.save()

Retrieve results with:

DailyStat.objects.all()

Of course, amongst other factors to consider, this approach presents the issue of having to update DailyStat when a past stat changes et cetera (signals can be used if you do take this path.)

stellarchariot
  • 2,872
  • 1
  • 19
  • 28