34

Say I have a model:

class Foo(models.Model):
    ...

and another model that basically gives per-user information about Foo:

class UserFoo(models.Model):
    user = models.ForeignKey(User)
    foo = models.ForeignKey(Foo)
    ...

    class Meta:
        unique_together = ("user", "foo")

I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.

So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)

James Tauber
  • 3,386
  • 6
  • 27
  • 37
  • 2
    On possibility is two queries: `UserFoo.objects.filter(user=request.user).select_related("foo")` and then `Foo.objects.exclude(userfoo__user=request.user)` but looking for other possibilities – James Tauber Jun 27 '11 at 23:45
  • What is the end goal / What is the use case? – Dan Loewenherz Jun 28 '11 at 03:46
  • @Dan I'd hoped that was clear from the question: UserFoo contains per-user information about Foo and I want to display a list of Foo's annotated with the user information from UserFoo for request.user – James Tauber Jun 28 '11 at 05:46
  • 1
    Why are you using `exclude` and not `filter` in your query above? Isn't the point to find `Foo`s related to the user? And what do you mean by "annotated"? Your use here is confusing, since Django has a very specific definition of annotation, specifically "describes an aggregate that is to be calculated." – Dan Loewenherz Jun 28 '11 at 16:56
  • the reason for `exclude` is that the `UserFoo` filter already gives me the ones related to the user; I want the `Foo`s that aren't related to the user too as it's a left outer join – James Tauber Jun 28 '11 at 22:16

10 Answers10

18

This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.

Note: tested on Djang 1.7

Suppose you have the following models

class User(models.Model):
    name = models.CharField()

class EarnedPoints(models.Model):
    points = models.PositiveIntegerField()
    user = models.ForeignKey(User)

To get total user points you might do something like that

 User.objects.annotate(points=Sum("earned_points__points"))

this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql

You can achieve that by doing this

 users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
 result = users_with_points | User.objects.exclude(pk__in=users_with_points)

This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.

Hope that helps

djvg
  • 11,722
  • 5
  • 72
  • 103
Ramast
  • 7,157
  • 3
  • 32
  • 32
  • 1
    This should, by far, be the accepted answer! And you should get extra points! This is like finding a needle in a haystack! Thank you so much! – Bobort Aug 03 '17 at 21:07
17

A solution with raw might look like

foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])

You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.

Brian Rosner
  • 694
  • 4
  • 9
  • 2
    and how would you select the UserFoo fields you want? – James Tauber Jun 27 '11 at 23:59
  • and how do you avoid column name clashes? – James Tauber Jun 27 '11 at 23:59
  • 1
    I specifically chose to not select anything out of `userfoo` to prevent the obvious name clash of the `id` field. If you needed a field from `userfoo` you would modify the query to be `SELECT foo.*, userfoo.columnA FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)` and accessed as `foos[0].columnA` – Brian Rosner Jun 28 '11 at 00:05
  • An example of a LEFT INNER JOIN `companies = CompanyInfo.objects.raw("SELECT company.*, financials.last_trade_price FROM financials_companyinfo as company JOIN financials_marketdata as financials ON company.ticker=financials.ticker")` – tandy Mar 15 '15 at 21:40
9

Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.


Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)

The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.

results = Foo.objects.extra(
    select={"user_comment": "UserFoo.comment"},
    where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
    params=[request.user.id]
)

This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.

connection = (
    UserFoo._meta.db_table, User._meta.db_table,  # JOIN these tables
    "user_id",              "id",                 # on these fields
)

results.query.join(  # modify the query
    connection,      # with this table connection
    promote=True,    # as LEFT OUTER JOIN
)

We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.

print results[0].user_comment

(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)

Community
  • 1
  • 1
Jeremy
  • 1
  • 85
  • 340
  • 366
  • 1
    I'm getting `join() got an unexpected keyword argument 'promote'`. Was this capability removed in more recent versions of Django? – Neil C. Obremski Feb 14 '14 at 23:07
  • 2
    `promote` keyword was removed as part of [ORM Cleanup](https://code.djangoproject.com/ticket/19849). No longer available in Django 1.6. – tcarobruce Feb 21 '14 at 23:56
3

I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.

Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.

Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:

class LinkPost(models.Model):
some fields....

class UserPostVote(models.Model):
    user = models.ForeignKey(User,related_name="post_votes")
    post = models.ForeignKey(LinkPost,related_name="user_votes")
    value = models.IntegerField(null=False, default=0)

where the userpostvote table collect's the votes of users on posts. Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.

First you get the posts for the page:

post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))

so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?

Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.

So here's how I do it:

q1 = posts_page.object_list.query  # The query object of the queryset
q1_alias = q1.get_initial_alias()  # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with 
                                        #parameters, which are none in this example

we now have the query for the queryset, and just wrap it, alias and left outer join to it:

q2_augment = "SELECT B.value as uservote, A.* 
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B 
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)

voila! Now we can access post.uservote for a post in the augmented queryset. And we just hit the database with a single query.

related
  • 794
  • 5
  • 14
2

The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.

Alex Gaynor
  • 14,353
  • 9
  • 63
  • 113
2

You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:

def userfoo_retriever(qs):
    userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
    for i in qs:
        i.userfoo = userfoos.get(i.pk, None)

for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
    print foo.userfoo

This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.

Chris Adams
  • 4,966
  • 1
  • 30
  • 28
  • 1
    As alternative to keep the iterator pattern intact, I've created a fork called https://github.com/vdboor/django-queryset-decorator That might be useful in some edge cases too – vdboor Jul 03 '12 at 15:30
  • 1
    vdboor: I like the full power of django-queryset-transform but that seems like it'd be a nice time-saver for simple cases where you don't the extra flexibility. – Chris Adams Jul 06 '12 at 13:18
1

As for outerjoins: Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with qs.query.promote_joins(["userfoo"])

maparent
  • 31
  • 3
1

You shouldn't have to resort to extra or raw for this.

The following should work.

Foo.objects.filter(
    Q(userfoo_set__user=request.user) |
    Q(userfoo_set=None)  # This forces the use of LOUTER JOIN.
).annotate(
    comment=F('userfoo_set__comment'),
    # ... annotate all the fields you'd like to see added here.
)
Simon Charette
  • 5,009
  • 1
  • 25
  • 33
  • 1
    This doesn't work. It's possible that this will create a join, but the join could be on to a UserFoo record with the wrong `user` value, which would then be filtered out. This is because the `user` condition is being made in your `WHERE` clause instead of in the `ON` join condition. – practual Feb 03 '20 at 21:34
1

The only way I see to do this without using raw etc. is something like this:

Foo.objects.filter(
    Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
    When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))

The double Q trick ensures that you get your left outer join.

Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).

Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.

Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.

practual
  • 2,363
  • 2
  • 11
  • 12
-1

maparent's comment put me on the right way:

from django.db.models.sql.datastructures import Join

for alias in qs.query.alias_map.values():
  if isinstance(alias, Join):
    alias.nullable = True

qs.query.promote_joins(qs.query.tables)
ajaest
  • 587
  • 4
  • 13