Postgres uses Hash Join with Seq Scan when Inner Select Index Cond is faster

Question

Postgres is using a much heavier Seq Scan on table tracking when an index is available. The first query was the original attempt, which uses a Seq Scan and therefore has a slow query. I attempted to force an Index Scan with an Inner Select, but postgres converted it back to effectively the same query with nearly the same runtime. I finally copied the list from the Inner Select of query two to make the third query. Finally postgres used the Index Scan, which dramatically decreased the runtime. The third query is not viable in a production environment. What will cause postgres to use the last query plan?

(vacuum was used on both tables)

Tables

tracking (worker_id, localdatetime) total records: 118664105
project_worker (id, project_id) total records: 12935

INDEX

CREATE INDEX tracking_worker_id_localdatetime_idx ON public.tracking USING btree (worker_id, localdatetime)

Queries

SELECT worker_id, localdatetime FROM tracking t JOIN project_worker pw ON t.worker_id = pw.id WHERE project_id = 68475018

Hash Join  (cost=29185.80..2638162.26 rows=19294218 width=16) (actual time=16.912..18376.032 rows=177681 loops=1)
 Hash Cond: (t.worker_id = pw.id)
  ->  Seq Scan on tracking t  (cost=0.00..2297293.86 rows=118716186 width=16) (actual time=0.004..8242.891 rows=118674660 loops=1)
  ->  Hash  (cost=29134.80..29134.80 rows=4080 width=8) (actual time=16.855..16.855 rows=2102 loops=1)
      Buckets: 4096  Batches: 1  Memory Usage: 115kB
    ->  Seq Scan on project_worker pw  (cost=0.00..29134.80 rows=4080 width=8) (actual time=0.004..16.596 rows=2102 loops=1)
          Filter: (project_id = 68475018)
          Rows Removed by Filter: 10833
Planning Time: 0.192 ms
Execution Time: 18382.698 ms

SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (SELECT id FROM project_worker WHERE project_id = 68475018 LIMIT 500)

Hash Semi Join  (cost=6905.32..2923969.14 rows=27733254 width=24) (actual time=19.715..20191.517 rows=20530 loops=1)
 Hash Cond: (t.worker_id = project_worker.id)
  ->  Seq Scan on tracking t  (cost=0.00..2296948.27 rows=118698327 width=24) (actual time=0.005..9184.676 rows=118657026 loops=1)
  ->  Hash  (cost=6899.07..6899.07 rows=500 width=8) (actual time=1.103..1.103 rows=500 loops=1)
      Buckets: 1024  Batches: 1  Memory Usage: 28kB
    ->  Limit  (cost=0.00..6894.07 rows=500 width=8) (actual time=0.006..1.011 rows=500 loops=1)
          ->  Seq Scan on project_worker  (cost=0.00..28982.65 rows=2102 width=8) (actual time=0.005..0.968 rows=500 loops=1)
                Filter: (project_id = 68475018)
                Rows Removed by Filter: 4493
Planning Time: 0.224 ms
Execution Time: 20192.421 ms

SELECT worker_id, localdatetime FROM tracking t WHERE worker_id IN (322016383,316007840,...,285702579)

Index Scan using tracking_worker_id_localdatetime_idx on tracking t  (cost=0.57..4766798.31 rows=21877360 width=24) (actual time=0.079..29.756 rows=22112 loops=1)
 "  Index Cond: (worker_id = ANY ('{322016383,316007840,...,285702579}'::bigint[]))"
Planning Time: 1.162 ms
Execution Time: 30.884 ms

... is in place of the 500 id entries used in the query

Same query ran on another set of 500 id's

Index Scan using tracking_worker_id_localdatetime_idx on tracking t  (cost=0.57..4776714.91 rows=21900980 width=24) (actual time=0.105..5528.109 rows=117838 loops=1)
 "  Index Cond: (worker_id = ANY ('{286237712,286237844,...,216724213}'::bigint[]))"
Planning Time: 2.105 ms
Execution Time: 5534.948 ms

Is your third query just reading data from the cache? What happens if you choose a different list of 500? — jjanes, Nov 12 '20 at 19:54
For better info from your plans, turn on track_io_timing and run `EXPLAIN (ANALYZE, BUFFERS)` — jjanes, Nov 12 '20 at 20:09
@jjanes I ran another set of ids without any possibility of caching. The results are similar. The *index scan* is still clearly faster due to the volume of data in the *tracking* table. — Noah Sragow, Nov 12 '20 at 21:15

score 3 · Answer 1 · answered Nov 13 '20 at 04:01

If you want to nudge PostgreSQL towards a nested loop join, try the following:

Create an index on tracking that can be used for an index-only scan:
```
CREATE INDEX ON tracking (worker_id) INCLUDE (localdatetime);
```
Make sure that tracking is VACUUMed often, so that an index-only scan is effective.
Reduce random_page_cost and increase effective_cache_size so that the optimizer prices index scans lower (but don't use insane values).

Make sure that you have good estimates on project_worker:

ALTER TABLE project_worker ALTER project_id SET STATISTICS 1000;
ANALYZE project_worker;

score 2 · Answer 2 · answered Nov 12 '20 at 21:46

The distribution of "worker_id" within "tracking" seems very skewed. For one thing, the number of rows in one of your instances of query 3 returns over 5 times as many rows as the other instance of it. For another, the estimated number of rows is 100 to 1000 times higher than the actual number. This can certainly lead to bad plans (although it is unlikely to be the complete picture).

What is the actual number of distinct values for worker_id within tracking: select count(distinct worker_id) from tracking? What does the planner think this value is: select n_distinct from pg_stats where tablename='tracking' and attname='worker_id'? If those values are far apart and you force the planner to use a more reasonable value with alter table tracking alter column worker_id set (n_distinct = <real value>); analyze tracking; does that change the plans?

Postgres uses Hash Join with Seq Scan when Inner Select Index Cond is faster

Tables

INDEX

Queries

2 Answers2