1

I am trying to get the maximum value from merging the results from two separate tables. I am trying to get the most updated comment for a Post. I have a model Comment that can be accessed by Post.comments. I also have a body that is separate of the Post that can be accessed by Post.body. Both comments and body have a field when that is a DateTimeField. I would like to return a sorted queryset by most recent activity such that the a Post with the most recent comment or body is shown first.

Assume the models look like this:

class Body(models.Model):
    when = models.DateTimeField(default=datetime.datetime.now)

class Post(models.Model):
    body = models.ForeignKey(Body)

class Comment(models.Model):
    post = models.ForeignKey(Post, related_name='comments')
    when = models.DateTimeField(default=datetime.datetime.now)

I would like the result to remain a queryset if possible since I continue to process it and further pass it into a paginator to limit results.

Basically, I would like to be able to call:

q = Post.annotate(
    activity=Max(
        Union('comments__when', 'body__when')
    )
)
q = q.order_by('-activity')

But I don't know how to accomplish that Union.

I believe the SQL that accomplishes what I'm looking for is comparable to:

SELECT
...
IF(MAX(c.`when`), MAX(c.`when`), b.`when`) AS `activity`
...
FROM `post` p
...
LEFT OUTER JOIN `comment` AS c
ON c.`post_id`=p.`id`
LEFT OUTER JOIN `body` AS b
ON p.`body_id`=b.`id`
...

Can such a customized annotation and Join be accomplished?

garromark
  • 814
  • 8
  • 20

1 Answers1

0

This took me an extremely long time to figure out. One large issue is that Django does not support multiple conditions for the ON clause of a JOIN statement. Because of the dependence of a single SELECT statement upon two separate JOINs, we have an issue tracking the table names accurately. When you ask django to annotate a table, say with Max(), you end up with conditions such as Max(T7.when) ... LEFT OUT JOIN table AS T7, where T7 is not consistent. Thus, I needed a way to accruately generate the expression Max(T7.when) given the JOINs automatically generated by Django. Many posts online tell you to use the .raw() support on querysets, but you'll lose a lot of benefit from using an ORM in this case.

The solution I came up with was to generate a custom aggregate function. I called this function CustomMax():

from django.conf import settings
from django.db import models

sum_if_sql_template = 'GREATEST(%(alt_table)s.%(alt_field)s, IFNULL(%(function)s(%(field)s), %(alt_table)s.%(alt_field)s))'

class CustomMaxSQL(models.sql.aggregates.Aggregate):
    sql_function = 'MAX'
    sql_template = sum_if_sql_template


class CustomMax(models.aggregates.Aggregate):
    name = 'Max'

    def __init__(self, lookup, **extra):
        self.lookup = lookup
        self.extra = extra

    def add_to_query(self, query, alias, col, source, is_summary):
        aggregate = CustomMaxSQL(col,
                             source=source,
                             is_summary=is_summary,
                             **self.extra)
        query.aggregates[alias] = aggregate

The usage of the function is:

q = Post.annotate(
    activity=CustomMax(
        'comments__when',
        alt_table="app_post",
        alt_field="when",
    )
)
q.query.join((
    'app_post',
    'app_comment',
    'id',
    'post_id',
))
q = q.order_by('-activity')

I include the .join() to allow for the alt_table to exist as a JOIN, and Django will automatically handle the naming of both the SELECT and JOIN statements for the Max portion. The generated SQL from this usage is akin to:

SELECT 
...
GREATEST(app_post.when, IFNULL(MAX(T7.`when`), app_post.when)) AS `activity`,
...
`app_post`.`id`
...
INNER JOIN `app_post` ON ...
...
LEFT OUTER JOIN `app_comment` T7 ON (`app_post`.`id` = T7.`post_id`)
...

Note: This is just for Post and Comment models from above. My actual implementation was a bit more complicated and required this solution.

This would also have been a lot easier if the Django team had developed this suggested patch to include join statements in .extra(): https://code.djangoproject.com/ticket/7231

garromark
  • 814
  • 8
  • 20