I started by ensuring the planner had updated stats:
my_db=> vacuum analyze;
VACUUM
Time: 1401.958 ms
When only selecting foos.bar_id
, the query executes fine with an Index Only Scan on that column:
my_db=> EXPLAIN ANALYZE SELECT foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
QUERY PLAN
Nested Loop (cost=0.43..16203.46 rows=353198 width=4) (actual time=0.045..114.746 rows=196205 loops=1)
-> Seq Scan on bar_ids (cost=0.00..16.71 rows=871 width=4) (actual time=0.005..0.195 rows=871 loops=1)
-> Index Only Scan using index_foos_on_bar_id on foos (cost=0.43..14.80 rows=378 width=4) (actual time=0.003..0.055 rows=225 loops=871)
Index Cond: (bar_id = bar_ids.id)
Heap Fetches: 0
Planning time: 0.209 ms
Execution time: 144.364 ms
(7 rows)
Time: 145.620 ms
However, adding foos.id
causes the query to choose an extremely slow Seq Scan:
my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
QUERY PLAN
Hash Join (cost=27.60..221339.63 rows=353198 width=8) (actual time=294.091..3341.926 rows=196205 loops=1)
Hash Cond: (foos.bar_id = bar_ids.id)
-> Seq Scan on foos (cost=0.00..182314.70 rows=7093070 width=8) (actual time=0.004..1855.900 rows=7111807 loops=1)
-> Hash (cost=16.71..16.71 rows=871 width=4) (actual time=0.454..0.454 rows=866 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 39kB
-> Seq Scan on bar_ids (cost=0.00..16.71 rows=871 width=4) (actual time=0.002..0.222 rows=871 loops=1)
Planning time: 0.237 ms
Execution time: 3371.622 ms
(8 rows)
Time: 3373.150 ms
Disabling Seq Scan forces an Index Scan on the same index, which is an order of magnitude faster than the Seq Scan:
my_db=> set enable_seqscan=false;
SET
Time: 0.801 ms
my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
QUERY PLAN
Nested Loop (cost=10000000000.43..10000439554.99 rows=353198 width=8) (actual time=0.028..171.632 rows=196205 loops=1)
-> Seq Scan on bar_ids (cost=10000000000.00..10000000016.71 rows=871 width=4) (actual time=0.005..0.212 rows=871 loops=1)
-> Index Scan using index_foos_on_bar_id on foos (cost=0.43..500.86 rows=378 width=8) (actual time=0.003..0.118 rows=225 loops=871)
Index Cond: (bar_id = bar_ids.id)
Planning time: 0.186 ms
Execution time: 201.958 ms
(6 rows)
Time: 203.185 ms
Other answers say the poor planning is due to bad statistics. My statistics are up to date. What gives?
bar_ids
is a temporary table, which might be related to the insane cost estimates in the last query (Seq Scan on bar_ids (cost=10000000000.00..10000000016.71
), but explicitly running ANALYZE bar_ids
doesn't change the query plan.