0

For some reason, the same query executed locally and from my golang backend application uses different execution plans.

Flow of events: I have encountered an inefficient plan running from my backend application to the database. After connecting to the same PostgreSQL database with a PG client on my local machine (with the same user as the backend Application), I reproduced the query which takes >70ms. I then use my local PG client for adding an index and re-executing the query and the execution went down to <1ms.

It seems however that the same query keeps being executed from my backend Application and still takes >70ms which makes me think that the newly added index is not chosen. I excluded the potential reason that the backend Application query runs in a transaction by using BEGIN; and COMMIT; on my local PG client, with no change in results.

Despite my best efforts I was unable to understand the reason for this unusual behavior. Maybe someone here has an idea? How is this possible?

-- The index I added:
CREATE INDEX CONCURRENTLY mytable_temp ON mytable (poolid, id asc) where NOT used;
-- The query:
BEGIN;
explain analyze UPDATE
    mytable
SET
    used = true
WHERE
    id = (
        SELECT
            id
        FROM
            mytable
        WHERE (poolid = 9
            AND used = false
            AND((startdate IS NULL
                OR startdate <= '2021-12-03 18:12:16.952353384')
            AND(enddate IS NULL
                OR enddate >= '2021-12-03 18:12:16.952353384')))
    ORDER BY
        id ASC
    LIMIT 1
FOR UPDATE SKIP LOCKED)
RETURNING
    mytable.id,
    mytable.poolid,
    mytable.code,
    mytable.startdate,
    mytable.enddate,
    mytable.attributes,
    mytable.used,
    mytable.importid,
    mytable.created;
COMMIT;

The plan (when executed locally):

Update on mytable  (cost=0.92..2.93 rows=1 width=132) (actual time=0.091..0.092 rows=1 loops=1)
  InitPlan 1 (returns $1)
    ->  Limit  (cost=0.43..0.49 rows=1 width=10) (actual time=0.069..0.069 rows=1 loops=1)
          ->  LockRows  (cost=0.43..98599.17 rows=1699030 width=10) (actual time=0.068..0.069 rows=1 loops=1)
                ->  Index Scan using mytable_temp on mytable mytable_1  (cost=0.43..81608.87 rows=1699030 width=10) (actual time=0.065..0.065 rows=1 loops=1)
                      Index Cond: (poolid = 9)
"                      Filter: ((NOT used) AND ((startdate IS NULL) OR (startdate <= '2021-12-03 18:12:16.952353+00'::timestamp with time zone)) AND ((enddate IS NULL) OR (enddate >= '2021-12-03 18:12:16.952353+00'::timestamp with time zone)))"
  ->  Index Scan using mytable_pkey on mytable  (cost=0.43..2.45 rows=1 width=132) (actual time=0.081..0.081 rows=1 loops=1)
        Index Cond: (id = $1)
Planning Time: 0.146 ms
Execution Time: 0.120 ms
Alechko
  • 1,406
  • 1
  • 13
  • 27
  • Can you add the execution plan for the slow database? – Laurenz Albe Dec 06 '21 at 12:12
  • Unfortunately I am unable to reproduce the slow execution. Whenever I try to, the index is chosen and it executes quickly. I know that queries from the backend application execute slowly because of the query statistics on my SaaS DB provider (Aiven) and because of some latency tracker I implemented in that backend (it sends a summary of all queries executed and how long they took to execute). I can go ahead and implement on that backend an `explain analyze` print each time the query is executed, but first I wanted to see if I can figure out an underlying cause for this strange behavior. – Alechko Dec 06 '21 at 12:29
  • Is the application using prepared statements or PL/pgSQL functions? Try `ANALYZE mytable;`. – Laurenz Albe Dec 06 '21 at 12:32
  • I've already tried `ANALYZE` on `mytable`. – Alechko Dec 06 '21 at 12:45
  • The following user extensions are defined: plpgsql: 1.0, btree_gin:1.3, pg_trgm: 1.5. There are no user defined functions in `pg_stat_user_functions`. – Alechko Dec 06 '21 at 12:45
  • The application is using prepared statements (I believe it's the default setting for the pgx driver behind the scenes). I initially suspected this to be the culprit, but I would imagine that restarting the container would fix it and it did not... – Alechko Dec 06 '21 at 12:46
  • Then you could use auto_explain to capture the execution plan on the slow system. Perhaps that shows something. – Laurenz Albe Dec 06 '21 at 12:46
  • You can also try restarting the application. – Laurenz Albe Dec 06 '21 at 12:47
  • Thanks for the suggestions @LaurenzAlbe. I've already restarted the application, but I didn't try to use `auto_explain` yet. Will try that next. – Alechko Dec 06 '21 at 12:48
  • Unfortunately using `auto_explain` is not permitted as I don't have root access on this database (since it is managed by Aiven). – Alechko Dec 06 '21 at 12:56
  • My main suspect at the moment is pgx under the hood use of prepared statement. Would an old prepared statement ignore new indexes? If there a way to 'clear' a prepared statement from the database? – Alechko Dec 06 '21 at 12:59
  • 1
    Prepared statements only live for the duration of the database session. If you restart the application, they will be closed. – Laurenz Albe Dec 06 '21 at 14:15
  • Upon going deeper I notice an even stranger behavior. The backend first executions of this query (after a restart) are definitely using the index and completes within 2ms. However, after the several few times, the execution time jumps to ~100ms. Execution time would later jump back to 2ms for 2-5 queries (maybe all in the same transaction), and then go back to ~50+ slow executions. The query parameters are always the same, except for `time.Now()` which is used as argument for `startDate` and `endDate` (seems unrelated). – Alechko Dec 06 '21 at 16:41

1 Answers1

2

Investigation shows that the slow statement is a prepared statement, and the first few executions are fast. That is an almost certain proof that a mis-estimated generic plan is at fault. You can do the following:

  • improve PostgreSQL's estimates so that it doesn't choose that generic plan (best, but difficult)

  • set plan_cache_mode to force_custom_plan for this statement (from PostgreSQL v12 on)

  • avoid using a prepared statement for this query

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Running setting `plan_cache_mode` to `force_custom_plan` on each DB connection in the pool seemed to do the trick. Now all queries are running <2ms. This solution feels hacky at the moment, a better one would be to avoid using a prepared statement, but I don't see that pgx package supports this... I'd be very interested to learn more about how to improve PostgreSQL estimates as it might improve DB performance across the entire application. – Alechko Dec 06 '21 at 18:10