Different execution plan for local PostgreSQL client for same query

Question

For some reason, the same query executed locally and from my golang backend application uses different execution plans.

Flow of events: I have encountered an inefficient plan running from my backend application to the database. After connecting to the same PostgreSQL database with a PG client on my local machine (with the same user as the backend Application), I reproduced the query which takes >70ms. I then use my local PG client for adding an index and re-executing the query and the execution went down to <1ms.

It seems however that the same query keeps being executed from my backend Application and still takes >70ms which makes me think that the newly added index is not chosen. I excluded the potential reason that the backend Application query runs in a transaction by using BEGIN; and COMMIT; on my local PG client, with no change in results.

Despite my best efforts I was unable to understand the reason for this unusual behavior. Maybe someone here has an idea? How is this possible?

-- The index I added:
CREATE INDEX CONCURRENTLY mytable_temp ON mytable (poolid, id asc) where NOT used;

-- The query:
BEGIN;
explain analyze UPDATE
    mytable
SET
    used = true
WHERE
    id = (
        SELECT
            id
        FROM
            mytable
        WHERE (poolid = 9
            AND used = false
            AND((startdate IS NULL
                OR startdate <= '2021-12-03 18:12:16.952353384')
            AND(enddate IS NULL
                OR enddate >= '2021-12-03 18:12:16.952353384')))
    ORDER BY
        id ASC
    LIMIT 1
FOR UPDATE SKIP LOCKED)
RETURNING
    mytable.id,
    mytable.poolid,
    mytable.code,
    mytable.startdate,
    mytable.enddate,
    mytable.attributes,
    mytable.used,
    mytable.importid,
    mytable.created;
COMMIT;

The plan (when executed locally):

Update on mytable  (cost=0.92..2.93 rows=1 width=132) (actual time=0.091..0.092 rows=1 loops=1)
  InitPlan 1 (returns $1)
    ->  Limit  (cost=0.43..0.49 rows=1 width=10) (actual time=0.069..0.069 rows=1 loops=1)
          ->  LockRows  (cost=0.43..98599.17 rows=1699030 width=10) (actual time=0.068..0.069 rows=1 loops=1)
                ->  Index Scan using mytable_temp on mytable mytable_1  (cost=0.43..81608.87 rows=1699030 width=10) (actual time=0.065..0.065 rows=1 loops=1)
                      Index Cond: (poolid = 9)
"                      Filter: ((NOT used) AND ((startdate IS NULL) OR (startdate <= '2021-12-03 18:12:16.952353+00'::timestamp with time zone)) AND ((enddate IS NULL) OR (enddate >= '2021-12-03 18:12:16.952353+00'::timestamp with time zone)))"
  ->  Index Scan using mytable_pkey on mytable  (cost=0.43..2.45 rows=1 width=132) (actual time=0.081..0.081 rows=1 loops=1)
        Index Cond: (id = $1)
Planning Time: 0.146 ms
Execution Time: 0.120 ms

Unfortunately I am unable to reproduce the slow execution. Whenever I try to, the index is chosen and it executes quickly. I know that queries from the backend application execute slowly because of the query statistics on my SaaS DB provider (Aiven) and because of some latency tracker I implemented in that backend (it sends a summary of all queries executed and how long they took to execute). I can go ahead and implement on that backend an `explain analyze` print each time the query is executed, but first I wanted to see if I can figure out an underlying cause for this strange behavior. — Alechko, Dec 06 '21 at 12:29
Is the application using prepared statements or PL/pgSQL functions? Try `ANALYZE mytable;`. — Laurenz Albe, Dec 06 '21 at 12:32
The following user extensions are defined: plpgsql: 1.0, btree_gin:1.3, pg_trgm: 1.5. There are no user defined functions in `pg_stat_user_functions`. — Alechko, Dec 06 '21 at 12:45
The application is using prepared statements (I believe it's the default setting for the pgx driver behind the scenes). I initially suspected this to be the culprit, but I would imagine that restarting the container would fix it and it did not... — Alechko, Dec 06 '21 at 12:46
Then you could use auto_explain to capture the execution plan on the slow system. Perhaps that shows something. — Laurenz Albe, Dec 06 '21 at 12:46
Thanks for the suggestions @LaurenzAlbe. I've already restarted the application, but I didn't try to use `auto_explain` yet. Will try that next. — Alechko, Dec 06 '21 at 12:48
Unfortunately using `auto_explain` is not permitted as I don't have root access on this database (since it is managed by Aiven). — Alechko, Dec 06 '21 at 12:56
My main suspect at the moment is pgx under the hood use of prepared statement. Would an old prepared statement ignore new indexes? If there a way to 'clear' a prepared statement from the database? — Alechko, Dec 06 '21 at 12:59
Prepared statements only live for the duration of the database session. If you restart the application, they will be closed. — Laurenz Albe, Dec 06 '21 at 14:15
Upon going deeper I notice an even stranger behavior. The backend first executions of this query (after a restart) are definitely using the index and completes within 2ms. However, after the several few times, the execution time jumps to ~100ms. Execution time would later jump back to 2ms for 2-5 queries (maybe all in the same transaction), and then go back to ~50+ slow executions. The query parameters are always the same, except for `time.Now()` which is used as argument for `startDate` and `endDate` (seems unrelated). — Alechko, Dec 06 '21 at 16:41

score 2 · Accepted Answer · answered Dec 06 '21 at 16:56

2

Investigation shows that the slow statement is a prepared statement, and the first few executions are fast. That is an almost certain proof that a mis-estimated generic plan is at fault. You can do the following:

improve PostgreSQL's estimates so that it doesn't choose that generic plan (best, but difficult)
set plan_cache_mode to force_custom_plan for this statement (from PostgreSQL v12 on)
avoid using a prepared statement for this query

answered Dec 06 '21 at 16:56

Laurenz Albe

209,280
17
206
263

Running setting `plan_cache_mode` to `force_custom_plan` on each DB connection in the pool seemed to do the trick. Now all queries are running <2ms. This solution feels hacky at the moment, a better one would be to avoid using a prepared statement, but I don't see that pgx package supports this... I'd be very interested to learn more about how to improve PostgreSQL estimates as it might improve DB performance across the entire application. – Alechko Dec 06 '21 at 18:10

Different execution plan for local PostgreSQL client for same query

1 Answers1