Join on UUID column not using index for multiple values

Question

I have a PostgreSQL database that I cloned.

Database 1 has varchar(36) as primary keys
Database 2 (the clone) has UUID as primary keys.

Both contain the same data. What I don't understand is why queries on Database 1 will use the index but Database 2 will not. Here's the query:

EXPLAIN (ANALYZE, BUFFERS)
select * from table1
INNER JOIN table2 on table1.id = table2.table1_id
where table1.id in (
'541edffc-7179-42db-8c99-727be8c9ffec',
'eaac06d3-e44e-4e4a-8e11-1cdc6e562996'
);

Database 1

Nested Loop  (cost=16.13..7234.96 rows=14 width=803) (actual time=0.072..0.112 rows=8 loops=1)
  Buffers: shared hit=23
  ->  Index Scan using table1_pk on table1  (cost=0.56..17.15 rows=2 width=540) (actual time=0.042..0.054 rows=2 loops=1)
"        Index Cond: ((id)::text = ANY ('{541edffc-7179-42db-8c99-727be8c9ffec,eaac06d3-e44e-4e4a-8e11-1cdc6e562996}'::text[]))"
        Buffers: shared hit=12
  ->  Bitmap Heap Scan on table2  (cost=15.57..3599.86 rows=904 width=263) (actual time=0.022..0.023 rows=4 loops=2)
        Recheck Cond: ((table1_id)::text = (table1.id)::text)
        Heap Blocks: exact=3
        Buffers: shared hit=11
        ->  Bitmap Index Scan on table2_table1_id_fk  (cost=0.00..15.34 rows=904 width=0) (actual time=0.019..0.019 rows=4 loops=2)
              Index Cond: ((table1_id)::text = (table1.id)::text)
              Buffers: shared hit=8
Planning:
  Buffers: shared hit=416
Planning Time: 1.869 ms
Execution Time: 0.330 ms

Database 2

Gather  (cost=1000.57..1801008.91 rows=14 width=740) (actual time=11.580..42863.893 rows=8 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=863 read=631539 dirtied=631979 written=2523
  ->  Nested Loop  (cost=0.56..1800007.51 rows=6 width=740) (actual time=28573.119..42856.696 rows=3 loops=3)
        Buffers: shared hit=863 read=631539 dirtied=631979 written=2523
        ->  Parallel Seq Scan on table1  (cost=0.00..678896.46 rows=1 width=519) (actual time=28573.112..42855.524 rows=1 loops=3)
"              Filter: (id = ANY ('{541edffc-7179-42db-8c99-727be8c9ffec,eaac06d3-e44e-4e4a-8e11-1cdc6e562996}'::uuid[]))"
              Rows Removed by Filter: 2976413
              Buffers: shared hit=854 read=631536 dirtied=631979 written=2523
        ->  Index Scan using table2_table1_id_fk on table2  (cost=0.56..1117908.70 rows=320236 width=221) (actual time=1.736..1.745 rows=4 loops=2)
              Index Cond: (table1_id = table1.id)
              Buffers: shared hit=9 read=3
Planning:
  Buffers: shared hit=376 read=15
Planning Time: 43.594 ms
Execution Time: 42864.044 ms

Some notes:

The query is orders of magnitude faster in Database 1
Having only one ID in the WHERE clause activates the index in both databases
Casting to ::uuid has no impact

I understand that these results are because the query planner calculates that the cost of the index in the UUID (Database 2) case is too high. But I'm trying to understand why it thinks that and if there's something I can do.

The only conclusion that can be drawn from those execution plans is that an index scan on `id` cannot be used in database 2. Perhaps you forgot to add the primary key after all. Another reason could be that the index is extremely bloated: Does a `REINDEX` fix the problem? — Laurenz Albe, Apr 19 '22 at 06:09
@LaurenzAlbe Thanks for the follow-up. Doing a `REINDEX DATABASE ...` didn't work. Your point about "the index being bloated" could be extremely valid: there's several million records here. — mango, Apr 19 '22 at 16:13

Join on UUID column not using index for multiple values

0 Answers0