1

We're upgrading a database from Postgres 9.6 to 13.4. During the upgrade, we've observing much slower query performance on the new 13.4 postgres instance.

It makes some sense that performance would be slower on a new instance - however, we're seeing queries be executed with significantly different query plans (DB schemas and indexes are of course the same).

We're trying to understand the reason for the different query plans. For example, we have a query that runs orders of magnitude slower on the new 13.4 version (e.g., 26s vs 200ms).

Is there a standard step in the "version upgrade" migration process that we are missing that would help explain the worse performance and vastly different query plans?

Edit: Example query and different query plans on Postgres 9.6 and 13.4

Query

EXPLAIN ANALYZE
SELECT
    *
FROM
    "my_table"
WHERE
    "foo" = TRUE
    AND "bar" = FALSE
    AND("baz" > '2021-09-16 18:58:42.311+00'
        OR("baz" = '2021-09-16 18:58:42.311+00'
            AND "id" > 81353542))
ORDER BY
    "baz" ASC,
    "id" ASC
LIMIT 1;

Postgres 9.6 Query Plan:

Limit  (cost=548886.22..548886.22 rows=1 width=152) (actual time=1232.777..1232.778 rows=1 loops=1)
  ->  Sort  (cost=548886.22..549884.61 rows=1996789 width=152) (actual time=1232.776..1232.776 rows=1 loops=1)
        Sort Key: baz, id
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Bitmap Heap Scan on my_table  (cost=30571.88..546889.43 rows=1996789 width=152) (actual time=531.993..1014.384 rows=1914762 loops=1)
              Recheck Cond: ((foo AND (NOT bar) AND (baz > '2021-09-16 18:58:42.311+00'::timestamp with time zone)) OR (foo AND (NOT bar) AND (baz = '2021-09-16 18:58:42.311+00'::timestamp with time zone) AND (id > 81353542)))"
              Filter: (foo AND (NOT bar))
              Heap Blocks: exact=83762
              ->  BitmapOr  (cost=30571.88..30571.88 rows=1996789 width=0) (actual time=512.609..512.609 rows=0 loops=1)
                    ->  Bitmap Index Scan on my_table_foo_bar_ts_id  (cost=0.00..30370.08 rows=1996789 width=0) (actual time=512.588..512.588 rows=2704908 loops=1)
                          Index Cond: ((foo = true) AND (bar = false) AND (baz > '2021-09-16 18:58:42.311+00'::timestamp with time zone))"
                    ->  Bitmap Index Scan on my_table_foo_bar_ts_id  (cost=0.00..2.12 rows=1 width=0) (actual time=0.020..0.020 rows=0 loops=1)
                          Index Cond: ((foo = true) AND (bar = false) AND (baz = '2021-09-16 18:58:42.311+00'::timestamp with time zone) AND (id > 21153592))"
Planning time: 0.181 ms
Execution time: 1233.303 ms

Postgres 13.4 Query Plan:

Limit  (cost=0.11..0.64 rows=1 width=152) (actual time=32871.675..32871.677 rows=1 loops=1)
  ->  Index Scan using my_table_foo_bar_ts_id on my_table  (cost=0.11..1040476.82 rows=1962741 width=152) (actual time=32871.673..32871.674 rows=1 loops=1)
        Index Cond: ((foo = true) AND (bar = false))
        Filter: ((baz > '2021-09-16 18:58:42.311+00'::timestamp with time zone) OR ((baz = '2021-09-16 18:58:42.311+00'::timestamp with time zone) AND (id > 81353542)))"
        Rows Removed by Filter: 18745951
Planning Time: 0.182 ms
Execution Time: 32871.705 ms
mattnedrich
  • 7,577
  • 9
  • 39
  • 45
  • 5
    Did you run `ANALYZE;` after the upgrade? (Better, yet: `VACUUM ANALYZE;`) See: https://dba.stackexchange.com/a/187981/3684 – Erwin Brandstetter Oct 01 '21 at 16:29
  • We don't know what you did so don't know if you skipped something "standard". If vacuuming and analyzing the database doesn't fix the problem, then show the `EXPLAIN (ANALYZE, BUFFERS)` for the query for each version, and show any non-default config settings. – jjanes Oct 01 '21 at 23:26
  • @ErwinBrandstetter we ran both, no change in result. I added a sample query an query plans – mattnedrich Oct 06 '21 at 15:53
  • @jjanes I added an example query and the different query plans. Is there any more information I can provide that might help diagnose what is happening? – mattnedrich Oct 06 '21 at 15:55
  • Is `work_mem` set to the same values in both installations? –  Oct 06 '21 at 15:59
  • @a_horse_with_no_name Yes, it is set to 64 MB – mattnedrich Oct 06 '21 at 17:53
  • Have you tried a reindex on the table? Is the new version on the same OS and version as the old one? How did you upgrade? – Adrian Klaver Oct 11 '21 at 21:22
  • @AdrianKlaver yes, we have run `reindex ` This has no material impact on the query speed - in fact, running the query immediately after a `reindex` is sometimes slower. We're running on Heroku using Heroku Postgres, which I think is essentially AWS RDS Postgres. We've upgraded using the process described here: https://devcenter.heroku.com/articles/upgrading-heroku-postgres-databases, specifically, the `pg:copy` method, though we've also attempted the `pg:upgrade` approach and see the same slow query behavior.
    – mattnedrich Oct 12 '21 at 19:27
  • Is shared_buffers setting the same in both versions? The default value for the setting is pretty low. – Anton Grig Oct 15 '21 at 11:49

1 Answers1

0

here is my thoughts.

postgresql is not using bitmap index scan becasue it way overestimated the no of records it thought It's gonna return and decided to do "combined index scan" and just read the table once instead.

so I suggest do 'analyze` to make sure your stats are up to date and also maybe your indecies are corrupted so try to reindex and try again

vaccume analyze;
analyze [table];
reindex; 
eshirvana
  • 23,227
  • 3
  • 22
  • 38