Background
I'm getting insanely slow queries when using small LIMIT clause values on a relatively simple query.
I've already read and re-read PostgreSQL query very slow with limit 1. (not that I couldn't have missed anything, but it's related, and folks shouldn't just direct me there without giving me a clue as to what I might be missing first).
I'm definitely encountering essentially the bug referred to there: for small values of limit, this query takes around 7,409,626 vs. 53 ns.
Simply changing LIMIT from 1 to 1000 gives me the instantaneous speed, dropping it to 10 or 1 gives me the OMGBBQ what is wrong here speed.
I tried applying the basic advice from the above linked SO: add an extra useless ORDER BY column to the query to trick the planner.
However, in my case, the planner is very slow even if I try to put the main query without limits in a WITH clause!
The Query
select id
from rounds
where userid = (
select id
from users
where integrationuserid = 'sample:64ce5bad-8c48-44a4-b473-5a7451980bb2')
order by created desc
limit 1;
Some explain-analyze results:
Naive query with Limit = 1
explain analyze select id from rounds where userid = (select id from users where integrationuserid = 'sample:64ce5bad-8c48-44a4-b473-5a7451980bb2') order by created desc, userid limit 1;
QUERY PLAN
------------------------------------------------
Limit (cost=3.07..47.03 rows=1 width=40) (actual time=7408.097..7408.099 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using users_integrationuserid_idx on users (cost=0.41..2.63 rows=1 width=16) (actual time=0.013..0.014 rows=1 loops=1)
Index Cond: (integrationuserid = 'sample:64ce5bad-8c48-44a4-b473-5a7451980bb2'::text)
-> Index Scan using recent_rounds_idx on rounds (cost=0.44..938182.73 rows=21339 width=40) (actual time=7408.096..7408.096 rows=1 loops=1)
Filter: (userid = $0)
Rows Removed by Filter: 23123821
Planning Time: 0.133 ms
Execution Time: 7408.114 ms
(9 rows)
vs. Limit = 1000 (arbitrary, just to see what happens)
explain analyze select id from rounds where userid = (select id from users where integrationuserid = 'sample:64ce5bad-8c48-44a4-b473-5a7451980bb2') order by created desc, userid limit 1000;
QUERY PLAN
------------------------------------------------
Limit (cost=24163.47..24165.97 rows=1000 width=40) (actual time=0.048..0.049 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Index Scan using users_integrationuserid_idx on users (cost=0.41..2.63 rows=1 width=16) (actual time=0.018..0.019 rows=1 loops=1)
Index Cond: (integrationuserid = 'sample:64ce5bad-8c48-44a4-b473-5a7451980bb2'::text)
-> Sort (cost=24160.84..24214.18 rows=21339 width=40) (actual time=0.047..0.048 rows=1 loops=1)
Sort Key: rounds.created DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on rounds (cost=226.44..22990.84 rows=21339 width=40) (actual time=0.043..0.043 rows=1 loops=1)
Recheck Cond: (userid = $0)
Heap Blocks: exact=1
-> Bitmap Index Scan on rounds_userid_idx (cost=0.00..221.10 rows=21339 width=0) (actual time=0.040..0.040 rows=1 loops=1)
Index Cond: (userid = $0)
Planning Time: 0.108 ms
Execution Time: 0.068 ms
(14 rows)
My basic questions are:
- Why is it so omg bad in the first place (in what universe would scanning all rows in the entire db beat scanning a smaller subset after applying the where clause)?
- How can I work around this?
AFAICT - I need the planner to reduce the original table to just the rows that match the WHERE clause - then apply the sort & limit to that.
But instead it applies the sort & limit to the entire table - in this case - around 23M items - and with unsurprisingly horrible results.
I've tried many dozens of syntaxes that try to create a subquery that extracts the user's rounds first, and then tries to apply the limit. But again, the planner sees through this and applies the limit on the original full 23M items.
Additional Attempts / Information
This article indicates that the original answer (to my initial link, above) no longer works as of pg 13 - and to use a CTE.
Postgres SLOWER when a LIMIT is set: how to fix besides adding a dummy `ORDER BY`?
However, that was basically my first instinct - and all uses of a CTE have failed for me.
One attempt with a CTE: (VERY SLOW!)
with r as (
select id, created
from rounds
where userid = (
select id
from users
where integrationuserid = 'sample:64ce5bad-8c48-44a4-b473-5a7451980bb2')
)
select r.id from r order by r.created desc limit 1;
Maybe doing random stuff with moving order by and limit around helps? (NO!)
with r as (
select id, created from rounds
where userid = (select id from users where integrationuserid = 'sample:64ce5bad-8c48-44a4-b473-5a7451980bb2')
order by created desc
)
select r.id from r limit 1;
Solution (thanks to @jjanes for pointing me to it, and to @denis-de-bernardy for sharing it in the first place): Solution, btw:
create index recent_rounds_idx on rounds (userid, created desc);