I have the following query that uses the psql SKIP LOCKED
feature to poll messages from a queue table sorted by the insertion time. My subquery also limits the result set by 10 since that is the maximum number of results that can be processed by the consumer.
DELETE FROM outbox_queue
WHERE id IN (
SELECT id FROM outbox_queue
WHERE delay_until IS NULL OR now() > delay_until
ORDER BY enqueued_on, id
LIMIT 10
FOR UPDATE SKIP LOCKED
)
RETURNING *
Under normal load, everything works as expected. However when I apply load tests to my system, I will occasionally (<5%) get results the range from dozens of records to even several hundred records being returned. This is confusing to me since my understanding is that the subquery should have a limit placed on the number of records returned.
I've checked my logs and verified that the records being pulled are indeed unique and are not duplicates. Once load is reduced, the behavior returns to what I would expect.
My application has three workers that poll messages from this table, and each worker has a single thread that runs this query and processes the result.
Postgres version:
PostgreSQL 10.18 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc (GCC) 7.4.0, 64-bit
What am I missing?
Edit: Also including query plans for my old query posted in problem, as well as query plan for Erwin's answer.
Original query plan:
Delete on outbox_queue (cost=1.87..85.97 rows=10 width=38) (actual time=1.324..1.358 rows=10 loops=1)
-> Nested Loop (cost=1.87..85.97 rows=10 width=38) (actual time=1.316..1.339 rows=10 loops=1)
-> HashAggregate (cost=1.44..1.54 rows=10 width=40) (actual time=1.297..1.300 rows=10 loops=1)
" Group Key: ""ANY_subquery"".id"
" -> Subquery Scan on ""ANY_subquery"" (cost=0.42..1.42 rows=10 width=40) (actual time=0.034..1.289 rows=10 loops=1)"
-> Limit (cost=0.42..1.32 rows=10 width=22) (actual time=0.030..1.280 rows=10 loops=1)
-> LockRows (cost=0.42..90221.41 rows=1009865 width=22) (actual time=0.030..1.277 rows=10 loops=1)
-> Index Scan using outbox_queue__enqueued_on on outbox_queue outbox_queue_1 (cost=0.42..80122.76 rows=1009865 width=22) (actual time=0.023..1.256 rows=10 loops=1)
Filter: ((delay_until IS NULL) OR (now() > delay_until))
-> Index Scan using outbox_queue_pkey on outbox_queue (cost=0.42..8.44 rows=1 width=14) (actual time=0.003..0.003 rows=1 loops=10)
" Index Cond: (id = ""ANY_subquery"".id)"
Planning time: 0.230 ms
Execution time: 1.407 ms
Query plan for Erwin's answer. As you can see, the LIMIT is now being evaluated outside the nested loop, whereas previously it was being evaluated inside.
Delete on outbox_queue (cost=1.97..86.07 rows=10 width=38) (actual time=0.065..0.094 rows=10 loops=1)
CTE sel
-> Limit (cost=0.42..1.32 rows=10 width=22) (actual time=0.030..0.039 rows=10 loops=1)
-> LockRows (cost=0.42..90221.41 rows=1009865 width=22) (actual time=0.029..0.037 rows=10 loops=1)
-> Index Scan using outbox_queue__enqueued_on on outbox_queue outbox_queue_1 (cost=0.42..80122.76 rows=1009865 width=22) (actual time=0.024..0.027 rows=10 loops=1)
Filter: ((delay_until >= now()) IS NOT TRUE)
-> Nested Loop (cost=0.65..84.75 rows=10 width=38) (actual time=0.061..0.080 rows=10 loops=1)
-> HashAggregate (cost=0.23..0.33 rows=10 width=40) (actual time=0.053..0.056 rows=10 loops=1)
Group Key: sel.id
-> CTE Scan on sel (cost=0.00..0.20 rows=10 width=40) (actual time=0.034..0.049 rows=10 loops=1)
-> Index Scan using outbox_queue_pkey on outbox_queue (cost=0.42..8.44 rows=1 width=14) (actual time=0.002..0.002 rows=1 loops=10)
Index Cond: (id = sel.id)
Planning time: 0.153 ms
Execution time: 0.174 ms
And here is my schema for the outbox_queue
table.
create table if not exists outbox_queue (
id bigserial not null primary key,
message_type varchar not null,
body varchar not null,
enqueued_on timestamptz not null,
delay_until timestamptz,
attributes jsonb not null
);
create index if not exists outbox_queue__enqueued_on on outbox_queue(enqueued_on);
create index if not exists outbox_queue__delay_until on outbox_queue(delay_until);