10

Here is my Django class:

class MyClass(models.Model):
    my_integer = models.IntegerField()
    created_ts = models.DateTimeField(default=datetime.utcnow, editable=False)

I would like to retrieve the instances of MyClass that have the latest created_ts for each unique value of my_integer. I cannot figure out how to do it.

Can someone show my how to do it?

Saqib Ali
  • 11,931
  • 41
  • 133
  • 272
  • 'latest' means only one. – Gocht Sep 02 '15 at 17:56
  • 1
    "latest created_ts for each unique value of my_integer". That means more than one. – Saqib Ali Sep 03 '15 at 02:34
  • 1
    @SaqibAli Your question states "I would like to retrieve the instances of `MyClass` [...]" but you accepted an answer that does not retrieve instances of `MyClass`. It returns dictionaries. You even [commented](http://stackoverflow.com/questions/32359954/how-to-make-django-queryset-that-selects-records-with-max-value-within-a-group#comment52598332_32361355) on another answer that you want instances of the class. – Louis Sep 09 '15 at 15:45
  • @SaqibAli I also have the exact same use case, and can not figure out how to do this. Can you tell if you were able to implement this, and how? – S.K Apr 02 '20 at 11:22

6 Answers6

5

This Will help you

from django.db.models import Count, Max
MyClass.objects.values('my_integer').annotate(count=Count("my_integer"),latest_date=Max('created_ts'))

Data in table

  my_integer      created_ts
    -             -----------
    1 2015-09-08 20:05:51.144321+00:00
    1 2015-09-08 20:08:40.687936+00:00
    3 2015-09-08 20:08:58.472077+00:00
    2 2015-09-08 20:09:08.493748+00:00
    2 2015-09-08 20:10:20.906069+00:00

Output

[
    {'count': 2, 'latest_date': datetime.datetime(2015, 9, 8, 20, 8, 40, 687936, tzinfo=<UTC>), 'my_integer': 1},
    {'count': 2, 'latest_date': datetime.datetime(2015, 9, 8, 20, 10, 20, 906069, tzinfo=<UTC>), 'my_integer': 2},
    {'count': 1, 'latest_date': datetime.datetime(2015, 9, 8, 20, 8, 58, 472077, tzinfo=<UTC>), 'my_integer': 3}
]
iammehrabalam
  • 1,285
  • 3
  • 14
  • 25
0

Try this;

from django.db.models import Max

MyClass.objects.values('my_integer').annotate(Max('created_ts'))
Geo Jacob
  • 5,909
  • 1
  • 36
  • 43
  • 1
    This give us only ``my_integer`` and ``created_ts__max`` values of objects and not actual instances. – maxsocl Sep 05 '15 at 20:00
0

This here is a really basic way to do it. Assuming the amount of data you have isn't super large, this will work decently. You can use this in your views by overriding the get_queryset function and just returning filtered. Or you can use it as a static method on your class if you plan on using it everywhere.

values = MyClass.objects.order_by('-created_ts').all()

filtered = []
existing = []
for value in values:
    if value.my_integer not in existing:
        existing.append(value.my_integer) 
        filtered.append(value)  

Since the list is ordered by the most recent first they will get added to the existing first for that integer. I did some basic testing with it, but not much so there may be a flaw or two in there. Tested with sqlite.

Edit

Here is a much faster version.

def iter_tools():
    import itertools
    qs = MyClass.objects.all()
    filtered = []
    group_by = itertools.groupby(qs, lambda x: x.my_integer)
    for x in group_by:
        filtered.append(sorted(x[1], key=lambda x: x.created_ts, reverse=True)[0])
    return filtered

Essentially the way this is working is getting all of your objects from your db, grouping them by the integer, then sorting each group based on the timestamp and getting just the first one from each group. Speeding this even more is beyond my skills but I am sure there is some ways.

Here is the timeit of this one vs the one earlier with only like 6 entries in the db:

In[]: timeit.timeit(manual, number=1500)
Out[]: 0.5577559471130371
In[]: timeit.timeit(iter_tools, number=1500)
Out[]: 0.39012885093688965
-----------------------------------------------
In[]: timeit.timeit(manual, number=5000)
Out[]: 1.770777940750122
In[]: timeit.timeit(iter_tools, number=5000)
Out[]: 1.2411231994628906

Edit 2: I created 60000 objects to the database to try it out with some data. I generated the data with django-fixtureless so the integers are completely random and the timestamp on all of them is a new datetime.now() for each object.

In[]: timeit.timeit(manual, number=1)
Out[]: 11.946185827255249
In[]: timeit.timeit(iter_tools, number=1)
Out[]: 0.7811920642852783
In[]: timeit.timeit(iter_tools, number=100)
Out[]: 77.93837308883667
In[]: MyClass.objects.all().count()
Out[]: 60000

A note about the DB: In the above examples I was using sqlite3 just on my local machine. I just now setup a quick little mysql server as a vm and received a much better speed result.

In[16]: MyClass.objects.all().count()
Out[16]: 60000
In[17]: timeit.timeit(iter_tools, number=100)
Out[17]: 49.636733055114746
In[18]: timeit.timeit(iter_tools, number=1)
Out[18]: 0.4923059940338135

Either way, you get the same objects returned. If performance is an issue I would recommend either using the itertools one or a custom sql query.

Jared Mackey
  • 3,998
  • 4
  • 31
  • 50
  • So the sorting/grouping can't be done natively by the database using Django querysets? I was hoping to avoid writing code to do it. I figured having the DB do it would be more efficient. – Saqib Ali Sep 07 '15 at 06:49
  • See my latest edit, I posted some times with a real mysql database. But to answer you question, I am not aware of any way to do it straight from the DB without custom SQL. But that doesn't mean there isn't ways. I am fairly inexperienced with a combination of mysql and django so there might be some ways to do it. However, to do it in just code might be easier than trying to find some complex hidden feature of django. – Jared Mackey Sep 07 '15 at 06:52
  • Don't forget you can also filter down the results in the queryset to be a more recent time if you are creating these objects really frequently. Example, if you know all the results you will need are going to be sometime within the last 5 minutes you can add a filter to the qs and that will drastically reduce the amount of objects the function has to loop through. The correct answer will really vary depending on the size of your data, how frequently all the data is updated, how frequently the results need to be calculated and how fast it needs to be done. – Jared Mackey Sep 07 '15 at 06:59
0

You can either do a raw query:

MyClass.objects.raw("""
SELECT m1.id, m1.my_integer, m1.created_ts
FROM app_myclass AS m1, (
    SELECT my_integer, MAX(created_ts) AS max_created_ts
    FROM app_myclass
    GROUP BY my_integer
) AS m2
WHERE m1.my_integer = m2.my_integer AND m1.created_ts = m2.max_created_ts
"""))

Or use the Django ORM:

MyClass.objects.filter(
    created_ts__in=MyClass.objects.values(
        "my_integer"
    ).annotate(
        created_ts=models.Max(
            "created_ts"
        )
    ).values_list("created_ts", flat=True)
)

Note that this requires only a single SQL request, as you can see by printing len(django.db.connection.queries) before and after the query.

However, note that the latter solution only works if your created_ts attribute is guaranteed to be unique, which might not be your case.

If you're not willing to use raw queries or an index on created_ts, then you should probably start using PostgreSQL and its DISTINCT ON feature, as suggested by other answers.

Régis B.
  • 10,092
  • 6
  • 54
  • 90
-1

untested

results = MyClass.objects.all().distinct('my_integer').order_by('created_ts')
Cody Bouche
  • 945
  • 5
  • 10
  • 1
    This gave me the exception: `NotImplementedError: DISTINCT ON fields is not supported by this database backend`. FYI, I'm using mysql. – Saqib Ali Sep 02 '15 at 19:22
-1
MyClass.objects.order_by('my_integer', '-created_ts').distinct('my_integer')

According to distinct, you need to call distinct on attributes, in the same order as in order_by. Hence order the elements based on integer and then in reverse timestamp, and call distinct on them, which returns the latest instance for every integer.

Anvesh Arrabochu
  • 165
  • 1
  • 2
  • 11
  • 1
    This gave me the exception: `NotImplementedError: DISTINCT ON fields is not supported by this database backend`. FYI, I'm using mysql. – Saqib Ali Sep 02 '15 at 19:23
  • Refer this, http://stackoverflow.com/questions/12402923/django-mysql-distinct-query-for-getting-multiple-values mysql backend doesnot support `distinct()` operation. Its equivalent (as far as I know) `MyClass.objects.values('my_integer', 'created_ts').distinct().order_by('-created_ts')`. Tested it and it works as expected. – Anvesh Arrabochu Sep 02 '15 at 20:26
  • Thanks ianveshi. This gets me closer. But still not quite right. I need to get the actual instances of MyClass that meet this criteria. your solution only gives the values of `created_ts` the value of `my_integer` for which it is valid. You have to assume that in actuality MyClass is a much more complex class than just `my_integer` and `created_ts`. I need the whole thing. How? – Saqib Ali Sep 02 '15 at 20:46
  • Unfortuanately, due to my noobosity, the only way I know to achieve that is through postgresql backend. Sorry. – Anvesh Arrabochu Sep 03 '15 at 04:36
  • 1
    Check this solution again because this QS return actual instances with all fields. User.objects.order_by('email', '-username').distinct('email') -> [, , ]. This give us the users with Max ``username`` grouped by ``email``. User model is for example. – maxsocl Sep 05 '15 at 19:55
  • @maxsocl thank you sooo much, this is the most underrated comment on SO – Pavel Shishmarev Aug 01 '22 at 11:06
  • @maxsocl thank you sooo much, this is the most underrated comment on SO – Pavel Shishmarev Aug 01 '22 at 11:07