PostgreSQL EXPLAIN gives two different plans for same query

Question

I have a SQL export of a cloud-based Postgres DB that I've imported locally (Alpine PG 14.5 on Docker). I'm having an issue where a query returns two different results on each one (both are running 14.5). The remote machine has 7.5GB memory and 2 CPUs. Locally I'm running an M1 Macbook Pro with 32GB of memory.

The issue is execution speed. The local one completes on my test query in ~15ms, the cloud takes ~5000ms. the 'websites' column is JSONB with a gin index applied. I believe that the data/table structure in both DBs is completely identical given the export/import process. There are just over 1,000,000 records in the users table.

EXPLAIN SELECT "users".* FROM "users" WHERE "users"."group_id" = 1 AND (websites @> '[{"url":"test.com/route"}]' OR websites @> '[{"url":"site.com/place"}]' OR websites @> '[{"url":"other.com/location"}]') ORDER BY "users"."id" ASC LIMIT 1

This is the result on my local machine:

Limit  (cost=1314.34..1314.34 rows=1 width=1188)
  ->  Sort  (cost=1314.34..1315.10 rows=306 width=1188)
        Sort Key: id
        ->  Bitmap Heap Scan on users  (cost=122.53..1312.81 rows=306 width=1188)
              Recheck Cond: ((profiles @> '[{\"url\": \"test.com/route\"}]'::jsonb) OR (profiles @> '[{\"url\": \"site.com/place\"}]'::jsonb) OR (profiles @> '[{\"url\": \"other.com/location\"}]'::jsonb))
              Filter: (group_id = 1)
              ->  BitmapOr  (cost=122.53..122.53 rows=306 width=0)
                    ->  Bitmap Index Scan on index_users_on_profiles  (cost=0.00..40.77 rows=102 width=0)
                          Index Cond: (profiles @> '[{\"url\": \"test.com/route\"}]'::jsonb)
                    ->  Bitmap Index Scan on index_users_on_profiles  (cost=0.00..40.77 rows=102 width=0)
                          Index Cond: (profiles @> '[{\"url\": \"site.com/place\"}]'::jsonb)
                    ->  Bitmap Index Scan on index_users_on_profiles  (cost=0.00..40.77 rows=102 width=0)
                          Index Cond: (profiles @> '[{\"url\": \"linkedin.com/in/haidar-khan-62a27251\"}]'::jsonb)

And this is the result from the remote server:

Limit  (cost=0.43..3009.22 rows=1 width=1082)
   ->  Index Scan using users_pkey on users  (cost=0.43..914673.14 rows=304 width=1082)
         Filter: ((group_id = '1'::bigint) AND ((websites @> '[{"url": "test.com/route"}]'::jsonb) OR (websites @> '[{"url": "site.com/place"}]'::jsonb) OR (websites @> '[{"url": "other.com/location"}]'::jsonb)))

Any thought on settings, machine configurations, etc. that could be causing the strategies to differ?

Looks like the remote server is either lacking (valid) indexes or it has messed up statistics. Try `ANALYZE` on the table. — Laurenz Albe, May 16 '23 at 16:40
I'm not getting anything back from ```ANALYZE users```, but it is executing, and took ~717ms locally and ~3865ms on the remote server. — jbeck, May 16 '23 at 16:52
Could you please share the results from explain (analyze, verbose, buffers) for these statements? — Frank Heikens, May 16 '23 at 17:50
If it is not enabled it uses default constants which are really high to generate plan. If it uses default contants it basically generates basic plan. You do not fetch or complex remote server process, but on the local server planner tries to generate possible plans depending on estimation. It might not be directly effect on it, but it worths to try it. — Umut TEKİN, May 16 '23 at 19:28
Those plans don't match the query. They are missing any Limit node, for example. — jjanes, May 17 '23 at 02:23
@jjanes bad code block formatting on my part ... the first line was being cut off. — jbeck, May 18 '23 at 19:25

score 0 · Answer 1 · answered May 18 '23 at 21:01

The slower plan is walking the index in the order specified by the ORDER BY, and will stop once it finds the first row meeting the filter. It assumes that this will happen very quickly, after walking only about 1/300 of the index. Based on the timings you report, it actually has to walk way farther than that (if you had done EXPLAIN ANALYZE, we could directly see how far it walked and wouldn't need to guess)

But why is only one system making that mistake? That is hard to say from the info at hand. Assuming the cost parameters are the same on both systems (are they?), it thinks the first plan is faster and so should use it on both systems (although the ratio by which it thinks it is faster is way off from the ratio by which it actually is faster--but it is still in the right direction). Maybe the gin index is actually missing or invalid on the 2nd system?

To get some clarity there, you would want to force the 2nd system to use a different plan. You can do that by changing the query to end with ORDER BY "users"."id"+0 ASC LIMIT 1. That dummy arithmetic will prevent it from using the ordering index. If it now uses the 1st plan, then we can see what the cost of that plan is on that system. If it doesn't, we would probably suspect the gin index is not usable.

PostgreSQL EXPLAIN gives two different plans for same query

1 Answers1