PostgreSQL inefficient index is selected in sub-query

Question

I've a query that for each row from campaigns gets the most recent row from effects:

SELECT
    id,
    (
        SELECT
            created
        FROM
            effects
        WHERE
            effects.campaignid = campaigns.id
        ORDER BY
            effects.created DESC
        LIMIT 1
    ) AS last_activity
FROM
    campaigns
WHERE
    deleted_at IS NULL
    AND id in(53, 54);

To optimize the performance for this query I'm using an index effects_campaign_created_idx over (campaignid,created). Additionally, for another use case, there's the index effects_created_idx on (created).

For some reason, at one moment, the subquery stopped using the correct index effects_campaign_created_idx and started using effects_created_idx instead, which is highly ineffective and takes ~5 minutes to run, instead of ~40ms.

Whenever I execute the internal query alone (using the same campaignid) the correct index is used.

What could be the reason for such behavior on part of the query planner? Should I structure my query differently, so that the right index is chosen?

What are more advanced ways to debug the query planner behavior?

Here's explain analyze results from executing the offending query:

explain analyze SELECT
    id,
    (
        SELECT
            created
        FROM
            effects
        WHERE
            effects.campaignid = campaigns.id
        ORDER BY
            effects.created DESC
        LIMIT 1) AS last_activity
FROM
    campaigns
WHERE
    deleted_at IS NULL
    AND id in(53, 54);
-----------------------------------------

Seq Scan on campaigns  (cost=0.00..4.56 rows=2 width=12) (actual time=330176.476..677186.438 rows=2 loops=1)
"  Filter: ((deleted_at IS NULL) AND (id = ANY ('{53,54}'::integer[])))"
  Rows Removed by Filter: 45
  SubPlan 1
    ->  Limit  (cost=0.43..0.98 rows=1 width=8) (actual time=338593.165..338593.166 rows=1 loops=2)
          ->  Index Scan Backward using effects_created_idx on effects  (cost=0.43..858859.67 rows=1562954 width=8) (actual time=338593.160..338593.160 rows=1 loops=2)
                Filter: (campaignid = campaigns.id)
                Rows Removed by Filter: 14026092
Planning Time: 0.245 ms
Execution Time: 677195.239 ms

Following advice here, tried moving to MAX(created) instead of using the Subquery with ORDER BY created DESC LIMIT 1. Unfortunately the results are still poor:

EXPLAIN ANALYZE SELECT
    campaigns.id,
    subquery.created
FROM
    campaigns
    LEFT JOIN (
        SELECT
            campaignid,
            MAX(created) created
        FROM
            effects
        GROUP BY
            campaignid) subquery ON campaigns.id = subquery.campaignid
WHERE
    campaigns.deleted_at IS NULL
    AND campaigns.id in(53, 54);


Hash Right Join  (cost=667460.06..667462.46 rows=2 width=12) (actual time=30516.620..30573.091 rows=2 loops=1)
  Hash Cond: (effects.campaignid = campaigns.id)
  ->  Finalize GroupAggregate  (cost=667457.45..667459.73 rows=9 width=16) (actual time=30251.920..30308.379 rows=23 loops=1)
        Group Key: effects.campaignid
        ->  Gather Merge  (cost=667457.45..667459.55 rows=18 width=16) (actual time=30251.832..30308.271 rows=49 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Sort  (cost=666457.43..666457.45 rows=9 width=16) (actual time=30156.539..30156.544 rows=16 loops=3)
                    Sort Key: effects.campaignid
                    Sort Method: quicksort  Memory: 25kB
                    Worker 0:  Sort Method: quicksort  Memory: 25kB
                    Worker 1:  Sort Method: quicksort  Memory: 25kB
                    ->  Partial HashAggregate  (cost=666457.19..666457.28 rows=9 width=16) (actual time=30155.951..30155.957 rows=16 loops=3)
                          Group Key: effects.campaignid
                          Batches: 1  Memory Usage: 24kB
                          Worker 0:  Batches: 1  Memory Usage: 24kB
                          Worker 1:  Batches: 1  Memory Usage: 24kB
                          ->  Parallel Seq Scan on effects  (cost=0.00..637166.13 rows=5858213 width=16) (actual time=220.784..28693.182 rows=4684157 loops=3)
  ->  Hash  (cost=2.59..2.59 rows=2 width=4) (actual time=264.653..264.656 rows=2 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on campaigns  (cost=0.00..2.59 rows=2 width=4) (actual time=264.612..264.640 rows=2 loops=1)
"              Filter: ((deleted_at IS NULL) AND (id = ANY ('{53,54}'::integer[])))"
              Rows Removed by Filter: 45
Planning Time: 0.354 ms
JIT:
  Functions: 34
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 9.958 ms, Inlining 409.293 ms, Optimization 308.279 ms, Emission 206.936 ms, Total 934.465 ms
Execution Time: 30578.920 ms

Notes

It is not the case where there is a majority of effects row with campaignid in (53,54).
I've reindexed and analyzed tables already.
[edit: the index was created with USING btree]

@a_horse_with_no_name I've added the index you recommended. Unfortunately it didn't resolve the issue and the inefficient index is still chosen over the efficient ones. — Alechko, Jul 26 '22 at 11:00
Can you temporarily drop effects_created_idx and capture the plan that way? — jjanes, Jul 26 '22 at 13:22
Why don’t you use MAX(created) and join directly, instead of this subquery — Frank Heikens, Jul 26 '22 at 15:47
@jjanes unfortunately I cannot do it as it is a production env and that index is being used in other flows. — Alechko, Jul 26 '22 at 16:16
@FrankHeikens no particular reason besides that the current query worked so far. I will try your suggestion tomorrow and update if it resolved the issue. Nonetheless, it would still not explain the underlying reason this started happening. — Alechko, Jul 26 '22 at 16:17
So use your non-prod environment to get the plans. Is the problem not reproducible there? — jjanes, Jul 26 '22 at 16:45
Even on Prod, you might be able to do it like this `BEGIN; DROP INDEX...; EXPLAIN ...; ROLLBACK;` It will lock the table in ACCESS EXCLUSIVE for the duration of the EXPLAIN, but if you don't use ANALYZE that should be extremely short. Then after the ROLLBACK, other sessions can just keep using the index is it wasn't actually dropped. The problem can be that it might take a long to get the lock, even though once granted it will be held only briefly. — jjanes, Jul 26 '22 at 16:50
@jjanes unfortunately the issue is only reproducible on one clients production environment. It's a pretty good idea, dropping the index analyze and rollback. I tried it and as expected, when the non efficient index is not around, the efficient one `effects_campaign_created_idx` is chosen for the query. — Alechko, Jul 27 '22 at 11:11

O. Jones · Answer 1 · 2022-07-27T11:09:45.950

Try refactoring your query to eliminate the correlated subquery. Subqueries like that with LIMIT clauses in them can baffle query planners. What you want is the latest created date for each campaignid. So your subquery will be this:

                         SELECT campaignid, MAX(created) created
                           FROM effects
                          GROUP BY campaignid

You can then build it into your main query like this.

SELECT campaigns.id, subquery.created
  FROM campaigns
  LEFT JOIN (
                         SELECT campaignid, MAX(created) created
                           FROM effects
                          GROUP BY campaignid
       ) subquery ON campaigns.id = subquery.campaignid
 WHERE campaigns.deleted_at IS NULL
   AND campaigns.id in(53, 54);

This allows the query planner to run that subquery efficiently and just once. It will use your (campaignid,created) index.

Asking "why" about query planner output is a tricky business. Query planners are very complex beasts. And correlated subqueries present planning complexities. It's possible a growing table changed some sort of internal index-selectiveness metric.

Pro tip Avoid correlated subqueries whenever possible, especially in long-lived code in growing systems. It isn't always possible, though.

I was really hoping that switching to `MAX(created)` would do the trick. Unfortunately, it behaves quite poorly as well (but not as bad as the original query). I am adding `EXPLAIN ANALYZE` results in the original post. — Alechko, Jul 27 '22 at 11:18

PostgreSQL inefficient index is selected in sub-query

Notes

1 Answers1