2

How do you make Django's ORM wrap queries like SELECT * FROM (<sql> OFFSET 0) ORDER BY <sort> LIMIT <n>? This is an optimization trick to forcing PostgreSQL to not do expensive scans.

I have a Django admin interface wrapping an especially large table with millions of records. When it tries to do simple queries for the changelist page, it can take minutes to run, even if there are few or no filters being used, regardless of how many indexes I've made.

However, if I run a raw SQL query from the command line or pgAdmin4 against it, like:

 SELECT * FROM mybigtable WHERE active=true AND cat_id IN (1,2,3) ORDER BY last_checked;

it runs fine, in just 3 seconds and appears to use indexes correctly.

But yet if I add a small LIMIT, which you'd think would improve performance by requiring fewer records, and run:

 SELECT * FROM mybigtable WHERE active=true AND cat_id IN (1,2,3) ORDER BY last_checked LIMIT 10;

this somehow prevents the planner from using indexes and it takes an astonishingly long 5 minutes. If run EXPLAIN on it, I get:

"Limit  (cost=0.56..11040.78 rows=10 width=1552)"
"  ->  Index Scan Backward using mybigtable_index on mybigtable  (cost=0.56..36844505.94 rows=33373 width=1552)"
"        Filter: (active AND (cat_id = ANY ('{1,2,3}'::integer[])))"

so it looks like it's using the index, but doing a "backwards scan", which I've read isn't optimal.

Then I read about the "OFFSET 0" trick. If I rewrite my query to:

 SELECT * FROM (
 SELECT * FROM mybigtable WHERE active=true AND cat_id IN (1,2,3)
 OFFSET 0) ss
 ORDER BY last_checked LIMIT 10;

it's logically equivalent, but for some reason this makes the query again run in just 3 seconds, and the EXPLAIN output is:

"Limit  (cost=436820.03..436820.05 rows=10 width=1552)"
"  ->  Sort  (cost=436820.03..436903.46 rows=33373 width=1552)"
"        Sort Key: mybigtable.last_checked DESC"
"        ->  Bitmap Heap Scan on mybigtable  (cost=313163.65..435765.12 rows=33373 width=1552)"
"              Recheck Cond: ((cat_id = ANY ('{1,2,3}'::integer[])) AND active)"
"              ->  BitmapAnd  (cost=313163.65..313163.65 rows=33373 width=0)"
"                    ->  Bitmap Index Scan on mybigtable_cat_ids  (cost=0.00..53342.96 rows=3337280 width=0)"
"                    ->  Bitmap Index Scan on mybigtable_active_true  (cost=0.00..254011.61 rows=15891810 width=0)"

I'm not sure how to interpret this, but what's going on? Why is OFFSET so vastly improving the query?

How do I get Django to use this syntax?

Cerin
  • 60,957
  • 96
  • 316
  • 522
  • What about **`MyModel.objects.all()[10:25]`**? ( the *array slicing* ?) – JPG May 22 '20 at 18:30
  • @ArakkalAbu I believe that just constructs SQL that just has OFFSET/LIMIT appended to it and doesn't use a subquery with OFFSET. – Cerin May 22 '20 at 19:56

0 Answers0