Postgres chooses much slower Seq Scan instead of Index Scan

Question

I started by ensuring the planner had updated stats:

my_db=> vacuum analyze;
VACUUM
Time: 1401.958 ms

When only selecting foos.bar_id, the query executes fine with an Index Only Scan on that column:

my_db=> EXPLAIN ANALYZE SELECT foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
 QUERY PLAN                                                                            
 Nested Loop  (cost=0.43..16203.46 rows=353198 width=4) (actual time=0.045..114.746 rows=196205 loops=1)
   ->  Seq Scan on bar_ids  (cost=0.00..16.71 rows=871 width=4) (actual time=0.005..0.195 rows=871 loops=1)
   ->  Index Only Scan using index_foos_on_bar_id on foos  (cost=0.43..14.80 rows=378 width=4) (actual time=0.003..0.055 rows=225 loops=871)
         Index Cond: (bar_id = bar_ids.id)
         Heap Fetches: 0
 Planning time: 0.209 ms
 Execution time: 144.364 ms
(7 rows)

Time: 145.620 ms

However, adding foos.id causes the query to choose an extremely slow Seq Scan:

my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
 QUERY PLAN                                                            
 Hash Join  (cost=27.60..221339.63 rows=353198 width=8) (actual time=294.091..3341.926 rows=196205 loops=1)
   Hash Cond: (foos.bar_id = bar_ids.id)
   ->  Seq Scan on foos  (cost=0.00..182314.70 rows=7093070 width=8) (actual time=0.004..1855.900 rows=7111807 loops=1)
   ->  Hash  (cost=16.71..16.71 rows=871 width=4) (actual time=0.454..0.454 rows=866 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 39kB
         ->  Seq Scan on bar_ids  (cost=0.00..16.71 rows=871 width=4) (actual time=0.002..0.222 rows=871 loops=1)
 Planning time: 0.237 ms
 Execution time: 3371.622 ms
(8 rows)

Time: 3373.150 ms

Disabling Seq Scan forces an Index Scan on the same index, which is an order of magnitude faster than the Seq Scan:

my_db=> set enable_seqscan=false;
SET
Time: 0.801 ms
my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id;
 QUERY PLAN                                                                      
 Nested Loop  (cost=10000000000.43..10000439554.99 rows=353198 width=8) (actual time=0.028..171.632 rows=196205 loops=1)
   ->  Seq Scan on bar_ids  (cost=10000000000.00..10000000016.71 rows=871 width=4) (actual time=0.005..0.212 rows=871 loops=1)
   ->  Index Scan using index_foos_on_bar_id on foos  (cost=0.43..500.86 rows=378 width=8) (actual time=0.003..0.118 rows=225 loops=871)
         Index Cond: (bar_id = bar_ids.id)
 Planning time: 0.186 ms
 Execution time: 201.958 ms
(6 rows)

Time: 203.185 ms

Other answers say the poor planning is due to bad statistics. My statistics are up to date. What gives?

bar_ids is a temporary table, which might be related to the insane cost estimates in the last query (Seq Scan on bar_ids (cost=10000000000.00..10000000016.71), but explicitly running ANALYZE bar_ids doesn't change the query plan.

have you tried running an analyze on the foos table to determine the most efficient execution? I've had this issue before and I have never understood why the system randomly thinks a seq scan is best.. — SpaceCowboy, Nov 22 '16 at 16:09
the first command, `vacuum analyze`, should do this for all tables if I'm understanding the docs correctly — rcrogers, Nov 22 '16 at 16:13
I agree it should, in my experience analysing a single table seems to work. I can't say why.. anyone else shed light on this? — SpaceCowboy, Nov 22 '16 at 16:14
@SpaceCowboy `ANALYZE bar_ids` doesn't change the query plan. @joop `bar_ids` is a temporary table that contains nulls and duplicates, so can't add a primary key, but running this didn't change the query plan: `CREATE INDEX ON bar_ids (id)` — rcrogers, Nov 22 '16 at 16:19
Try setting `enable_hashjoin` and `enable_mergejoin` to `off`. Then run the "slow" query again - hopefully it will choose the good plan - and post the result for that. It will be interesting to compare the cost estimates for the slow hash join plan and that plan; maybe one can figure out what to tune. — Laurenz Albe, Nov 22 '16 at 16:27
Hash_join is normally a last attempt when no usable index is available, and/or the hashed set **is assumed** to fit into memory. BTW: you *can* (and should) add a PK (or index) on a temp table, just like you can run analyze on it. BTW2: *why* do you use a temp table? (in most cases) a joined query or CTE will do exactly the same, without loss of structure. — joop, Nov 22 '16 at 16:33
Kinda curious what the settings for `random_page_cost` and `seq_page_cost` are. — yieldsfalsehood, Nov 22 '16 at 16:35
@SpaceCowboy yes, ran `ANALYZE bar_ids` after `CREATE INDEX` @LaurenzAlbe `enable_hashjoin=false` had the same (fast) query plan as `enable_seqscan=false`. `enable_mergejoin=false` seemed to have no effect — rcrogers, Nov 22 '16 at 16:40
@yieldsfalsehood `random_page_cost` is 4, `seq_page_cost` is 1 — rcrogers, Nov 22 '16 at 16:41
`random_page_cost` could likely use to be reduced, though that's more art than science. I'd try 2 and see if that tilts the scale in your favor - given everything else it may just be that the planner is reluctant to pick the index scan because of this. With good enough modern hardware the default of 4 is a bit cautious. — yieldsfalsehood, Nov 22 '16 at 16:51
`is a temporary table that contains nulls and duplicates,` What do you expect to gain by joining with such a table? Answer: non-matches and ... duplicates. — joop, Nov 22 '16 at 16:57
@yieldsfalsehood that worked! please feel free to add an answer and I'll accept it. thanks! — rcrogers, Nov 22 '16 at 17:00

yieldsfalsehood · Accepted Answer · 2016-11-22T19:45:43.263

Following up from the comments on the OP here.

In the case of the first query when you're only selecting foos.bar_id the executor is able to fulfill this with an index only scan, which is nifty. Tacking on another column (that's not covered in the index) to the select list, however, means that to continue using this index implies the typical "double read" situation where we first read the index page then read the table page to get the remaining column's value, which implies a potential for quite a bit of random IO.

The setting random_page_cost should be taken relative to seq_page_cost to mean (loosely) that random IO is random_page_cost times more expensive than sequential IO (given seq_page_cost=1). With modern drives, random IO isn't that expensive, so lowering random_page_cost can make index scans more preferable. Finding the "best" value for this is tricky, but starting around 2 is a decent rule of thumb.

E: I didn't have time to add these extra thoughts earlier but it's been bugging me. If you happen upon this because you're experiencing similar issues, please don't just start fiddling with this config out the gate. In this situation it seemed like a fruitful approach because OP had already clarified that statistics were fresh and reasonable, that the index scan was viable, and that the planner had a stronger preference for the table scan even though that was clearly worse (in terms of empirical evidence). This config isn't a silver bullet and if you are having performance issues that don't really match the ones given here then this solution may not be for you! Please consider that other situations may require query rewrites or changes to other configuration items (especially those related to memory usage and allocation).

Postgres chooses much slower Seq Scan instead of Index Scan

1 Answers1