Query planner behaviour degradation after PostgreSQL update from (10.11 to 11.6)

Question

After updating postgres, I noticed that one of the queries I was using became much slower. After running EXPLAIN ANALYZE I see that it is now using a different index on the same query.

Among other columns, my table has an applicationid column which is a foreign key BIGINT, and I have an attributes columns which is a jsonb key/value map.

The description on coupons table are (some irrelevant parts were omitted):

+------------------------+--------------------------+-------------------------------------------------------+
| Column                 | Type                     | Modifiers                                             |
|------------------------+--------------------------+-------------------------------------------------------|
| id                     | integer                  |  not null default nextval('coupons_id_seq'::regclass) |
| created                | timestamp with time zone |  not null default now()                               |
| campaignid             | bigint                   |                                                       |
| value                  | text                     |                                                       |
| expirydate             | timestamp with time zone |                                                       |
| startdate              | timestamp with time zone |                                                       |
| attributes             | jsonb                    |  not null default '{}'::jsonb                         |
| applicationid          | bigint                   |                                                       |
| deleted                | timestamp with time zone |                                                       |
| deleted_changelogid    | bigint                   |  not null default 0                                   |
| accountid              | bigint                   |  not null                                             |
| recipientintegrationid | text                     |                                                       |
+------------------------+--------------------------+-------------------------------------------------------+
Indexes:
    "coupons_applicationid_value_idx" UNIQUE, btree (applicationid, value) WHERE deleted IS NULL
    "coupons_attrs_index" gin (attributes)
    "coupons_recipientintegrationid_idx" btree (recipientintegrationid)
    "coupons_value_trgm_idx" gin (value gin_trgm_ops)

The query I'm running is (some irrelevant parts were omitted):

EXPLAIN ANALYZE SELECT 
     *,
     COUNT(*) OVER () AS total_rows 
 FROM 
     coupons
 WHERE 
     deleted IS NULL 
     AND coupons.applicationid = 2 
     AND coupons.attributes @> '{"SessionId":"1070695459"}' 
 ORDER BY 
     id ASC 
 LIMIT 1000;

applicationid doesn't help us much. The index that was previously used was coupons_attrs_index (over attributes column) which produced very good results.

After the update however, the query planner started preferring the index coupons_applicationid_value_idx for some reason!

Here is output from EXPLAIN ANALYZE (some irrelevant parts were omitted):

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                                               |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   ->  Sort  (cost=64.09..64.10 rows=1 width=237) (actual time=3068.996..3068.996 rows=0 loops=1)                                                                         |
|         Sort Key: coupons.id                                                                                                                                             |
|         Sort Method: quicksort  Memory: 25kB                                                                                                                             |
|         ->  WindowAgg  (cost=0.86..64.08 rows=1 width=237) (actual time=3068.988..3068.988 rows=0 loops=1)                                                               |
|               ->  Nested Loop  (cost=0.86..64.07 rows=1 width=229) (actual time=3068.985..3068.985 rows=0 loops=1)                                                       |
|                     ->  Index Scan using coupons_applicationid_value_idx on coupons  (cost=0.43..61.61 rows=1 width=213) (actual time=3068.984..3068.984 rows=0 loops=1) |
|                           Index Cond: (applicationid = 2)                                                                                                                |
|                           Filter: (attributes @> '{"SessionId": "1070695459"}'::jsonb)                                                                                   |
|                           Rows Removed by Filter: 2344013                                                                                                                |
| Planning Time: 0.531 ms                                                                                                                                                  |
| Execution Time: 3069.076 ms                                                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN
Time: 3.159s (3 seconds), executed in: 3.102s (3 seconds)

Can anyone help me understand why the query planner uses a less efficient index (coupons_applicationid_value_idx instead of coupons_attrs_index) after the update?

After adding a mixed (BTREE + GIN) index on (applicationid, attributes) that index was selected effectively solving the issue. I would still like to understand what happened to predict issues like this one in the future.

[EDIT 31-01-20 11:02]: The issue returned after 24 hours. Again the wrong index was chosen by the planner and the query became slow. Running a simple analyze solved it. It is still very strange that it only started happening after the update to PG 11.

@LaurenzAlbe I haven't, may that be the explain why it started choosing the wrong index right after updating to PG 11? If so, what are the best practices to update statistics, daily / weekly? — Alechko, Jan 30 '20 at 17:32
@jjanes Check out `CREATE EXTENSION btree_gin;` maybe I was misleading when I called it (BTREE + GIN). — Alechko, Jan 30 '20 at 17:34
OK. I would not have expected that to help much, as a multicolumn GIN index should be about the same as two single-column GIN indexes, which in turn should be about the same as one GIN index (on the array/JSONB column) and one plain old BTree index (on the simple scalar column), both of which you already had. Maybe autovac/autoanalyze came along and did its work at around the same time you were making the index, and that is what caused the change in plan to happen, — jjanes, Jan 30 '20 at 17:59
Those were my initial thoughts. It was just very strange that a PG version update triggered this behaviour. Maybe the analytics because outdated after the update... — Alechko, Jan 30 '20 at 18:12
The DB server is running on Aiven and I do not administer it directly... — Alechko, Jan 31 '20 at 08:58
If an `analyze` fixes the problem, you need to make auto-vacuum more aggressive so that it kicks in earlier and does more work. Either for all tables or just that table. If you can't change that, setup a `cron` job that runs `analyze` manually. — , Jan 31 '20 at 09:06
@a_horse_with_no_name I will follow that advice for now, but I'm also considering to downgrade back to PG 10.11 and see if it helps, since the issue started happenning only after the update. In the meanwhile, I'll educate myself about auto-vacuum. — Alechko, Jan 31 '20 at 09:11
@AlexGordon: see e.g. [here](https://www.citusdata.com/blog/2016/11/04/autovacuum-not-the-enemy/) or [here](https://www.percona.com/blog/2018/08/10/tuning-autovacuum-in-postgresql-and-autovacuum-internals/) or [here](https://dba.stackexchange.com/questions/21068/aggressive-autovacuum-on-postgresql) — , Jan 31 '20 at 09:15

Query planner behaviour degradation after PostgreSQL update from (10.11 to 11.6)

0 Answers0