2

My issue is similar to this question: How to select only the latest rows for each user? But I am implementing this with Django.

In the following example (which I borrowed from the question above), I need to extract only the last row for each user. Additionally, in my case, I only want to get rows for a particular list of user_ids.

id  | user_id | period_id | completed_on
----------------------------------------
1   | 1       | 1         | 2010-01-01
2   | 2       | 1         | 2010-01-10
3   | 3       | 1         | 2010-01-13
4   | 1       | 2         | 2011-01-01
5   | 2       | 2         | 2011-01-03
6   | 2       | 3         | 2012-01-13
... | ...     | ...       | ...

If the user_list is [1, 2], I'd like to get a result like this:

id  | user_id | period_id | completed_on
----------------------------------------
4   | 1       | 2         | 2011-01-01
6   | 2       | 3         | 2012-01-13

I was writing it using filter, but couldn't figure out the right way.

PeriodTable.objects.filter(user__in=user_list, period_id=max(....?)).values(...)
Community
  • 1
  • 1
MaxHeap
  • 1,138
  • 2
  • 11
  • 20

2 Answers2

0

First a little helper function for converting an array of ints into an argument for a sql IN clause.

def to_sql_in(arr):
    return ','.join(map(str, arr))

Then you can use a cursor for querying directly your db, because in this case I don't think filters would be enough.

user_list = [1, 2]

with connection.cursor() as cursor:
    cursor.execute('''SELECT t.Id, t.User_Id, t.Period_Id, t.Completed_On
                      FROM Table1 t
                      JOIN (SELECT Max(completed_on) Max_Completed_On, t.User_Id
                      FROM Table1 t
                      GROUP BY t.User_Id) t2
                      ON t.User_Id = t2.User_Id AND t.Completed_On = t2.Max_Completed_On
                      WHERE t.User_Id IN ({0})'''.format(to_sql_in(user_list)))

    r = cursor.fetchall()
    columns = [col[0] for col in cursor.description]
    print [dict(zip(columns, row)) for row in r]
Marcs
  • 3,768
  • 5
  • 33
  • 42
0

If by "last" you refer to the maximum of completed_on for each user, and this column is unsorted, then probably you'll have to go directly with SQL (as already suggested). However, if by "last" you mean the last entry on the table for each user, you can solve the problem directly with the Django aggregation API. Assuming that your model is M:

from django.db.models import Max
q = M.objects.filter(user_id__in=users_list)
max_ids = q.values('user_id').annotate(Max('id')).values_list('id__max')
result = M.objects.filter(id__in=max_ids)

The third line simply groups by user_id but remembering the maximum id row for each group, and creates a list of those ids. Finally, the result is the set of rows with those ids.

I stress that this solution works only if you want to obtain the last entry in the database for each user.

thelastone
  • 475
  • 4
  • 11