0

Problem

I have the query,SELECT p.* FROM parties p INNER JOIN bib b ON p.id=b.id;.

On Postgres version 9.2.24, this query takes 12 or 13 minutes. Here is the output of EXPLAIN ANALYZE on said query:

                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=156455.80..24854275.23 rows=4608791 width=46) (actual time=2516.474..407029.156 rows=4556073 loops=1)
   Hash Cond: (p.id = b.pid)
   ->  Seq Scan on parties p  (cost=0.00..22882138.64 rows=128697664 width=46) (actual time=371.099..220812.052 rows=133719944 loops=1)
   ->  Hash  (cost=98845.91..98845.91 rows=4608791 width=4) (actual time=2141.741..2141.741 rows=4608467 loops=1)
         Buckets: 524288  Batches: 1  Memory Usage: 162017kB
         ->  Seq Scan on bib b  (cost=0.00..98845.91 rows=4608791 width=4) (actual time=21.570..1199.429 rows=4608467 loops=1)
 Total runtime: 407293.833 ms
(7 rows)

On Postgres version 10.17, this query takes roughly 3 hours. Here is the output of EXPLAIN ANALYZE on said query:

                                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.57..12597544.56 rows=10945244 width=1038) (actual time=49.143..31536008.494 rows=10854924 loops=1)
   Workers Planned: 5
   Workers Launched: 5
   ->  Nested Loop  (cost=0.57..11502020.16 rows=2189049 width=1038) (actual time=88.401..31499828.808 rows=1809154 loops=6)
         ->  Parallel Seq Scan on bib b  (cost=0.00..147545.49 rows=2189049 width=4) (actual time=3.113..1830.876 rows=1824207 loops=6)
         ->  Index Scan using idx_parties_cmp on parties p  (cost=0.57..5.18 rows=1 width=1038) (actual time=17.262..17.265 rows=1 loops=10945244)
               Index Cond: (id = b.id)
 Planning time: 114.828 ms
 Execution time: 31536841.489 ms
(9 rows)

Notes

  • Both databases have similar data
  • Both databases have the same indexes
  • Both databases have the same configuration
  • Both databases are running on the same amount of ram - 125 GB
  • The 9.2.24 database has more swap (11 GB vs 3 GB on 10.17)
  • I have run vacuumdb --all --analyze-in-stages on both

Questions

  • Why is the query taking drastically longer in Postgres 10.17? I realize that the second query returns ~6 million more rows, but it takes 525 minutes vs 7 minutes. Could a difference of 6 million really account for a 75X reduction in performance?

  • Is cost measured the same between Postgres versions? Because the cost for the query on 10.17 is roughly half the cost of the query on version 9.2, but it takes roughly 30x as long.

iluvfugu
  • 71
  • 6
  • 2
    Maybe the parallel query overloads your server. Try to run the query after `set max_parallel_workers_per_gather=0;`. –  Jan 12 '22 at 12:04
  • 4
    Unrelated, but: upgrading to 10 at the beginning of 2022 is a bit futile - it will be [out of support](https://www.postgresql.org/support/versioning/) in about 10 months. You should have gone straight to 14. –  Jan 12 '22 at 12:05
  • 2
    +1 to upgrading to a current version. Parallelization is one area where all database engines have issues and your query is one of those that don't benefit from parallelization - you're loading everything from `parties` that has a matching record in `bib`, and both tables have several million rows. And `The 9.2.24 database has more swap (11 GB vs 3 GB on 10.17)` is a **huge** difference. In the first case the database decided to load everything in memory and create a hash. In the second, there's not enough space to do this so it tried to perform an index scan for every row – Panagiotis Kanavos Jan 12 '22 at 12:09
  • 1
    The higher number of rows probably prevent the use of a Hash Join in new server (because it doesn't have enough `work_mem` to do this in memory) –  Jan 12 '22 at 12:14
  • Okay, thank you so much for the info @a_horse_with_no_name and @Panagiotis Kanavos. Unfortunately this is a physical machine provided by my data warehouse provider, so I have no direct control over the version, but I will ask. As for the comment about `work_mem`, I had considered this as well, and tried the same query with increased `work_mem`, but the query planner still selected the same plan. I will run `set max_parallel_workers_per_gather=0` and see if I can increase the swap. Thanks again! – iluvfugu Jan 12 '22 at 12:34
  • Update: running `set max_parallel_workers_per_gather=0;` forced the query planner to select the hash table approach as in version 9. This improved the runtime of my query to 27 minutes from ~3 hours. – iluvfugu Jan 12 '22 at 13:50
  • 1
    Obviously the data are *not* comparable. What execution plan do you get if the `SET enable_nestloop = off;` on the v10 database before running the query? – Laurenz Albe Jan 12 '22 at 14:31
  • @Laurenze Albe I `SET max_parallel_workers_per_gather=0` and ran your suggestion. It chooses a hash join and parallel seq scan. – iluvfugu Jan 12 '22 at 14:49
  • These databases don't have the same configuration, version 10 has different options than version 9.2 Because of these differences, and the differences in the query planner, the result will be different. And as said by others, version 10 is obsolete within months and imho a waist of time. – Frank Heikens Jan 12 '22 at 15:00

1 Answers1

2

The cost estimation has changed between 9.2 and 10, but not radically (assuming the configurable settings have not been changed radically--something we can't know except by you telling us).

The biggest difference (of those probably relevant to you here) is the introduction of within-query parallelization which of course required changes to the cost model. v10 thinks the total cost will be 2 fold less, but it also thinks it will have 6 processes dedicated to it, so it thinks in total it will be using 3 times more resources. It probably is not getting any benefit from parallelization (because your hardware is inadequate), but it incorrectly thinks it will.

Note that the default setting of max_parallel_workers_per_gather in v10 is 2, but you are obviously running with a setting of at least 5. In other words, someone setting up your server apparently went out of their way to lie to PostgreSQL about how capable the server was.

Now clearly that is not the only estimation problem. If we say it takes 75 times longer to return twice as much data, and forgive it a factor of 6 in parallelization credits, that still leaves it wrong by about another factor of 6. But for all we know, it was also off by that amount even in 9.2. It could be that the 6 fold estimation error still left the nested loop looking more expensive, so it wasn't chosen. And only when that error was combined with the parallel misestimation did the combined error become high enough to switch the plan. The way to know for sure would be to force 9.2 to use the nested loop plan, and see how much slower it actually is, and how much costlier it is expected to be. Of course if the purpose of the exercise is "how do I fix this" rather than "who do I blame for this", then doing that work against 9.2 might be pointless.

jjanes
  • 37,812
  • 5
  • 27
  • 34