12

I have a table which contains list of some web sites and a table with statistics of them.

class Site(models.Model):
    domain_name = models.CharField(
        max_length=256,
        unique=True,
    )


class Stats(models.Model):
    date = models.DateField()
    site = models.ForeignKey('Site')
    google_pr = models.PositiveIntegerField()

    class Meta:
        unique_together = ('site', 'date')

I want to see all sites and statistics for a concrete date. If a stats record for the date doesn't exist, then the selection must contain only site.

If I use:

Site.objects.filter(stats__date=my_date)

I will not get sites which have no records for my_date in stats table. Because in this case the SQL query will be like the following:

SELECT *
FROM site
LEFT OUTER JOIN stats ON site.id = stats.site_id
WHERE stats.date = 'my_date'

The query condition will exclude records with NULL-dates and sites without stats will be not included to the selection.

In my case I need join stats table, which has already been filtered by date:

SELECT *
FROM site
LEFT OUTER JOIN
  (SELECT *
   FROM stats
   WHERE stats.date = 'my-date') AS stats
ON site.id = stats.site_id

How can I translate this query to Django ORM?

Thanks.

psln
  • 121
  • 1
  • 1
  • 4
  • 2
    Everyone seems to avoid the actual question like a hot potato :D Debanshu Kundu answer is the one that addresses the actual question: How to join a subquery in Django ORM. – korulis Aug 19 '22 at 14:25

3 Answers3

14

In Django v2.0 use FilteredRelation

Site.objects.annotate(
    t=FilteredRelation(
        'stats', condition=Q(stats__date='my-date')
).filter(t__google_pr__in=[...])
Charmy
  • 381
  • 3
  • 5
  • This unfortunately doesn't work for "nested" relationships. Say, if `stats` had a m2m field itself that you wanted to filter on. Anyone know how to deal with that? – getup8 Nov 16 '19 at 07:29
  • I was trying to join with subquery because doing a group by separately was much faster. In fact, it was the fact that I had several `COUNT () FILTER (WHERE)` with part of conditions in common that was slow. So the FilteredRelation with the common criteria improved the execution by roughly 60 times. – Eric Darchis Sep 17 '20 at 14:26
  • 2
    Support for nested relationships was added in Django 3.2 as per their documentation: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#filteredrelation-objects – Darius Bogdan Sep 12 '21 at 14:40
9

I had a similar problem and wrote the following utility function for adding left outer join on a subqueryset using Django ORM.

The util is derived from a solution given to add custom left outer join to another table (not subquery) using Django ORM. Here is that solution: https://stackoverflow.com/a/37688104/2367394

Following is the util and all related code:

from django.db.models.fields.related import ForeignObject
from django.db.models.options import Options
from django.db.models.sql.where import ExtraWhere
from django.db.models.sql.datastructures import Join


class CustomJoin(Join):
    def __init__(self, subquery, subquery_params, parent_alias, table_alias, join_type, join_field, nullable):
        self.subquery_params = subquery_params
        super(CustomJoin, self).__init__(subquery, parent_alias, table_alias, join_type, join_field, nullable)

    def as_sql(self, compiler, connection):
        """
        Generates the full
        LEFT OUTER JOIN (somequery) alias ON alias.somecol = othertable.othercol, params
        clause for this join.
        """
        params = []
        sql = []
        alias_str = '' if self.table_alias == self.table_name else (' %s' % self.table_alias)
        params.extend(self.subquery_params)
        qn = compiler.quote_name_unless_alias
        qn2 = connection.ops.quote_name
        sql.append('%s (%s)%s ON (' % (self.join_type, self.table_name, alias_str))
        for index, (lhs_col, rhs_col) in enumerate(self.join_cols):
            if index != 0:
                sql.append(' AND ')
            sql.append('%s.%s = %s.%s' % (
                qn(self.parent_alias),
                qn2(lhs_col),
                qn(self.table_alias),
                qn2(rhs_col),
            ))
        extra_cond = self.join_field.get_extra_restriction(
            compiler.query.where_class, self.table_alias, self.parent_alias)
        if extra_cond:
            extra_sql, extra_params = compiler.compile(extra_cond)
            extra_sql = 'AND (%s)' % extra_sql
            params.extend(extra_params)
            sql.append('%s' % extra_sql)
        sql.append(')')
        return ' '.join(sql), params

def join_to(table, subquery, table_field, subquery_field, queryset, alias):
    """
    Add a join on `subquery` to `queryset` (having table `table`).
    """
    # here you can set complex clause for join
    def extra_join_cond(where_class, alias, related_alias):
        if (alias, related_alias) == ('[sys].[columns]',
                                    '[sys].[database_permissions]'):
            where = '[sys].[columns].[column_id] = ' \
                    '[sys].[database_permissions].[minor_id]'
            children = [ExtraWhere([where], ())]
            return where_class(children)
        return None
    foreign_object = ForeignObject(to=subquery, from_fields=[None], to_fields=[None], rel=None)
    foreign_object.opts = Options(table._meta)
    foreign_object.opts.model = table
    foreign_object.get_joining_columns = lambda: ((table_field, subquery_field),)
    foreign_object.get_extra_restriction = extra_join_cond
    subquery_sql, subquery_params = subquery.query.sql_with_params()
    join = CustomJoin(
        subquery_sql, subquery_params, table._meta.db_table,
        alias, "LEFT JOIN", foreign_object, True)

    queryset.query.join(join)

    # hook for set alias
    join.table_alias = alias
    queryset.query.external_aliases.add(alias)

    return queryset

join_to is the utility function you want to use. For your query you can use it in as follows:

sq = Stats.objects.filter(date=my_date)
q = Site.objects.filter()
q = join_to(Site, sq, 'id', 'site_id', q, 'stats')

And following statement would print a query similar to you example query (with subquery).

print q.query
Debanshu Kundu
  • 785
  • 7
  • 18
-1

Look at it this way: you want to see statistics with accompanying site data for certain date, which translates to:

Stats.objects.filter(date=my_date).select_related('site')
Suor
  • 2,845
  • 1
  • 22
  • 28
  • Thanks, Suor. But in this case Django ORM uses `INNER JOIN` to join stats to sites and then applies filtering by date. If stats table doesn't contain a record for certain site and date, then the site won't be included to the result. I guess, there is only one way to solve the problem - to join filtered stats table. – psln Apr 07 '14 at 04:56
  • I don't quite understand you. If stats table doesn't contain record than you don't have date and no row should be returned. If you mean that stats can have no site and so would be eliminated on join, than no it's not how it works in Django. You set `site = models.ForeignKey('Site', null=True)` in this case and Django makes left join for you for the same code. – Suor Apr 07 '14 at 05:04
  • Site can have no stats records for certain date. I need to get all sites and stats, which exist for this date. F.e. if `site` table contains records `site_1` and `site_2`, `stats` table contains only `stats_1(site=site_1, date=my_date)`, I want to get `[{site_1, stats_1}, {site_2, NULL}]`. But your code will return just `[{site_1, stats_1}]`. – psln Apr 07 '14 at 05:59
  • 1
    Oh, got it. You will probably need 2 queries for that using Django ORM and then join everything by hand. Just a note, in upcoming Django 1.7 you can use single expression to do it: `Site.objects.prefetch_related(Prefetch('stats_set', queryset=Stats.objects.filter(date=my_date)))`. – Suor Apr 07 '14 at 06:51