I have a couple of tables in PostgreSQL 14.6, with indexes as stated here:
# \d uprn
Table "public.uprn"
Column | Type | Collation | Nullable | Default
-----------+----------------------+-----------+----------+---------
uprn | text | | |
postcode | text | | |
location | geometry(Point,4326) | | |
Indexes:
"uprn_location" gist (location)
"uprn_postcode" btree (postcode)
"uprn_uprn" btree (uprn)
# \d geometry_subdivided
Table "public.geometry_subdivided"
Column | Type | Collation | Nullable | Default
---------+----------+-----------+----------+---------
area_id | integer | | |
geom | geometry | | |
Indexes:
"subdivided_area_id" btree (area_id)
"subdivided_geom_id" gist (geom)
And this many rows (VACUUM ANALYZE
has been done post import):
# select count(*) from uprn;
count
----------
32872945
(1 row)
# select count(*) from geometry_subdivided;
count
--------
938500
(1 row)
23 of the uprn
rows have a postcode
of "M19 1TF". If I run a query joining these two tables on that postcode, looking for areas covering the points, it takes a number of seconds, using a sequence scan:
# explain analyze select area_id,count(*) from uprn u, geometry_subdivided g where st_covers(g.geom, st_transform(u.location,27700)) and postcode='M19 1TF' group by area_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=6287745.55..6292743.36 rows=16428 width=12) (actual time=6273.505..6290.003 rows=16 loops=1)
Group Key: g.area_id
-> Gather Merge (cost=6287745.55..6292414.80 rows=32856 width=12) (actual time=6273.497..6289.990 rows=48 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=6286745.52..6287622.38 rows=16428 width=12) (actual time=6254.442..6254.452 rows=16 loops=3)
Group Key: g.area_id
-> Sort (cost=6286745.52..6286983.05 rows=95010 width=4) (actual time=6254.431..6254.434 rows=69 loops=3)
Sort Key: g.area_id
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 33kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=25.29..6278890.20 rows=95010 width=4) (actual time=4756.836..6254.376 rows=69 loops=3)
-> Parallel Seq Scan on uprn u (cost=0.00..1264850.55 rows=101 width=32) (actual time=4725.962..6221.730 rows=8 loops=3)
Filter: (postcode = 'M19 1TF'::text)
Rows Removed by Filter: 10957641
-> Index Scan using subdivided_geom_id on geometry_subdivided g (cost=25.29..49643.02 rows=94 width=2040) (actual time=0.102..0.253 rows=9 loops=23)
Index Cond: (geom ~ st_transform(u.location, 27700))
Filter: st_covers(geom, st_transform(u.location, 27700))
Rows Removed by Filter: 7
Planning Time: 0.359 ms
Execution Time: 6290.100 ms
(22 rows)
But if I discourage PostgreSQL from using sequence scans, the same query takes milliseconds:
# set enable_seqscan to off;
SET
# explain analyze select area_id,count(*) from uprn u, geometry_subdivided g where st_covers(g.geom, st_transform(u.location,27700)) and postcode='M19 1TF' group by area_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=12076657.05..12078536.73 rows=17175 width=12) (actual time=7.710..7.766 rows=16 loops=1)
Group Key: g.area_id
-> Sort (cost=12076657.05..12077226.36 rows=227724 width=4) (actual time=7.681..7.714 rows=207 loops=1)
Sort Key: g.area_id
Sort Method: quicksort Memory: 34kB
-> Nested Loop (cost=31.60..12053278.12 rows=227724 width=4) (actual time=0.203..7.603 rows=207 loops=1)
-> Bitmap Heap Scan on uprn u (cost=6.31..966.53 rows=242 width=32) (actual time=0.035..0.073 rows=23 loops=1)
Recheck Cond: (postcode = 'M19 1TF'::text)
Heap Blocks: exact=10
-> Bitmap Index Scan on uprn_postcode (cost=0.00..6.25 rows=242 width=0) (actual time=0.024..0.025 rows=23 loops=1)
Index Cond: (postcode = 'M19 1TF'::text)
-> Index Scan using subdivided_geom_id on geometry_subdivided g (cost=25.29..49802.00 rows=94 width=2038) (actual time=0.116..0.322 rows=9 loops=23)
Index Cond: (geom ~ st_transform(u.location, 27700))
Filter: st_covers(geom, st_transform(u.location, 27700))
Rows Removed by Filter: 7
Planning Time: 0.259 ms
Execution Time: 7.851 ms
(17 rows)
Why is PostgreSQL not using the index in the normal query, and is there a way I can reword the query to do so? I've tried a number of combinations but without luck so far