5

I have a bit of Django code that builds a relatively complicated query in a programmatic fashion, with various filters getting applied to an initial dataset through a series of filter and exclude calls:

for filter in filters:
    if filter['name'] == 'revenue':
        accounts = accounts.filter(account_revenue__in: filter['values'])
    if filter['name'] == 'some_other_type':
        if filter['type'] == 'inclusion':
            accounts = accounts.filter(account__some_relation__in: filter['values'])
        if filter['type'] == 'exclusion':
            accounts = accounts.exclude(account__some_relation__in: filter['values'])
    ...etc

return accounts

For most of these conditions, the possible values of the filters are relatively small and contained, so the IN clauses that Django's ORM generates are performant enough. However there are a few cases where the IN clauses can be much larger (10K - 100K items).

In plain postgres I can make this query much more optimal by using a table value constructor, e.g.:

SELECT domain 
FROM accounts 
INNER JOIN  ( 
    VALUES  ('somedomain.com'), ('anotherdomain.com'), ...etc 10K more times 
) VALS(v) ON accounts.domain=v

With a 30K+ IN clause in the original query it can take 60+ seconds to run, while the table value version of the query takes 1 second, a huge difference.

But I cannot figure out how to get Django ORM to build the query like I want, and because of the way all the other filters are constructed from ORM filters I can't really write the entire thing as raw SQL.

I was thinking I could get the raw SQL that Django's ORM is going to run, regexp parse it, but that seems very brittle (and surprisingly difficult to get the actual SQL that is about to be run, because of parameter handling etc). I don't see how I could annotate with RawSQL since I don't want to add a column to select, but instead want to add a join condition. Is there a simple solution I am missing?

v_krishna
  • 206
  • 2
  • 6
  • 1
    As for extracting the IN clauses: these are neatly grouped up in `accounts.query.where.children` With exclude clauses being negated IN clauses (which themselves are the children of the exclude clause i.e. `where.children[index_of_exclude].children`) and IN clause children having `lhs` (columns) and `rhs` (values) attributes. With that you might have an easier time constructing a raw sql query... I still wouldn't try that, mind you. – CoffeeBasedLifeform Jun 21 '18 at 20:30

0 Answers0