2

I noticed that a query that used to be fast in legacy version of Django is now much slower in 4.0.8.

There is a fairly large table with a FK 'marker' and a boolean 'flag' that has an index attached. The following queries could reasonably return tens of thousands of rows.

In my codebase, there is a query like

MyModel.objects.filter(marker_id=123, flag=False).count()

In Django Debug Toolbar (and also in shell when I examine str(qs.query)) it now resolves to the following SQL syntax:

SELECT ••• FROM `myapp_mymodel` WHERE (`myapp_mymodel`.`marker_id` = 123 AND NOT `myapp_mymodel`.`flag`)

In extreme cases, this query runs for 20s or so. Meanwhile, in legacy Django version (1.11+) the same query becomes the following SQL:

SELECT ••• FROM `myapp_mymodel` WHERE (`myapp_mymodel`.`marker_id` = 123 AND `myapp_mymodel`.`flag` = 0)

This works, since the table schema contains 'flag' as a TINYINT(1), but most importantly, it works much faster - returning in under a second.

EDIT: I asked sql server to EXPLAIN both queries, and there is a difference in 'flag' appearing as a potential key in the latter (faster) query but not in the slower one. This is consistent with this answer stating that mysql needs to see comparison against a value to know to use an index. Thus, the main question becomes, how can I enforce the syntax that makes use of the index already in place?

END EDIT

Original questions: Why is the difference in ORM-to-SQL translation, and where can I find the code responsible (I have checked db.backends.mysql to no avail, or failed to recognize the culprit)? Is there a way to hint to Django that I'd much prefer the equals-zero behaviour?

The only workaround I see so far is to use raw SQL query. I'd rather avoid that if possible.

zencodism
  • 442
  • 4
  • 9
  • I have also tried querying via `flag__exact=0` - this results with the same SQL under the hood. – zencodism Nov 02 '22 at 17:33
  • I would drop the why question as we can only speculate why the creators of django orm wrote the code the way they did and their rationale is irrelevant for your case. The focus should be on how to get django use the alternative syntax. – Shadow Nov 02 '22 at 17:47
  • Please provide `SHOW CREATE TABLE` so we can discuss what DJango turned its datatype into. – Rick James Nov 02 '22 at 17:57

2 Answers2

1

This is a regression already reported and resolved in Django's issue tracker being issue #32691.

This is fixed in Django 4.1 so if you update it will get resolved automatically. For versions 3.2 to 4.0 you can use the workaround noted by Todor Velichkov on the issue which is to use Value() expressions:

from django.db.models import Value


MyModel.objects.filter(marker_id=123, flag=Value(0)).count()
Abdul Aziz Barkat
  • 19,475
  • 3
  • 20
  • 33
  • Thank you! I can confirm it worked in my case - sql syntax now shows `flag = (0)` and the index is being used. – zencodism Nov 02 '22 at 18:10
0

(MySQL/MariaDB, using ENGINE InnoDB)

It depends on what the index(es) are.

INDEX(flag)

is likely to be used only if less than 20% of the rows match. (The 20% is approximate.)

INDEX(flag, ...)

where the extra columns can be used for filtering, is likely to be used for either TRUE or FALSE.

Note that, in InnoDB, the PRIMARY KEY's column(s) are implicitly tacked onto the end of the index. This turns INDEX(flag) into INDEX(flag, id) possibly allowing the second form above to be relevant.

Rick James
  • 135,179
  • 13
  • 127
  • 222