-1

This is a follow up to this issue I posted a while ago.

I have the following code:

SET work_mem = '16MB'; 
SELECT      s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset 
FROM        rm_o_resource_usage_instance_splits_new s 
INNER JOIN  rm_o_resource_usage r ON s.usage_id = r.id 
INNER JOIN  scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id 
WHERE       r.schedule_id = 10 
ORDER BY    r.resource_id, s.start_date

When I run EXPLAIN (ANALYZE, BUFFERS) I get the following:

Sort  (cost=3724.02..3724.29 rows=105 width=89) (actual time=245.802..247.573 rows=22302 loops=1)
  Sort Key: r.resource_id, s.start_date
  Sort Method: quicksort  Memory: 6692kB
  Buffers: shared hit=198702 read=5993 written=612
  ->  Nested Loop  (cost=703.76..3720.50 rows=105 width=89) (actual time=1.898..164.741 rows=22302 loops=1)
        Buffers: shared hit=198702 read=5993 written=612
        ->  Hash Join  (cost=703.34..3558.54 rows=105 width=101) (actual time=1.815..11.259 rows=22302 loops=1)
              Hash Cond: (s.usage_id = r.id)
              Buffers: shared hit=3 read=397 written=2
              ->  Bitmap Heap Scan on rm_o_resource_usage_instance_splits_new s  (cost=690.61..3486.58 rows=22477 width=69) (actual time=1.782..5.820 rows=22302 loops=1)
                    Recheck Cond: (solution = 10)
                    Heap Blocks: exact=319
                    Buffers: shared hit=2 read=396 written=2
                    ->  Bitmap Index Scan on rm_o_resource_usage_instance_splits_new_solution_idx  (cost=0.00..685.00 rows=22477 width=0) (actual time=1.609..1.609 rows=22302 loops=1)
                          Index Cond: (solution = 10)
                          Buffers: shared hit=2 read=77
              ->  Hash  (cost=12.66..12.66 rows=5 width=48) (actual time=0.023..0.023 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    Buffers: shared hit=1 read=1
                    ->  Bitmap Heap Scan on rm_o_resource_usage r  (cost=4.19..12.66 rows=5 width=48) (actual time=0.020..0.020 rows=1 loops=1)
                          Recheck Cond: (schedule_id = 10)
                          Heap Blocks: exact=1
                          Buffers: shared hit=1 read=1
                          ->  Bitmap Index Scan on rm_o_resource_usage_sched  (cost=0.00..4.19 rows=5 width=0) (actual time=0.017..0.017 rows=1 loops=1)
                                Index Cond: (schedule_id = 10)
                                Buffers: shared read=1
        ->  Index Scan using scheduledactivities_activity_index_idx on scheduledactivities sa  (cost=0.42..1.53 rows=1 width=16) (actual time=0.004..0.007 rows=1 loops=22302)
              Index Cond: (activity_index = s.activity_index)
              Filter: (solution_id = 10)
              Rows Removed by Filter: 5
              Buffers: shared hit=198699 read=5596 written=610
Planning time: 7.070 ms
Execution time: 248.691 ms

Every time I run EXPLAIN, I get roughly the same results. The Execution Time is always between 170ms and 250ms, which, to me is perfectly fine. However, when this query is run through a C++ project (using PQexec(conn, query) where conn is a dedicated connection, and query is the above query), the time it takes seems to vary widely. In general, the query is very quick, and you don't notice a delay. The problem is, that on occasion, this query will take 2 to 3 minutes to complete.

If I open the pgadmin, and have a look at the "server activity" for the database, there's about 30 or so connections, mostly sitting at "idle". The above query's connection is marked as "active", and will stay as "active" for several minutes.

I am at a loss of why it randomly takes several minutes to complete the same query, with no change in data in the DB either. I have tried increasing the work_mem which didn't make any difference (nor did I really expect it to). Any help or suggestions would be greatly appreciated.

There isn't any more specific tags, but I'm currently using Postgres 10.11, but it's also been an issue on other versions of 10.x. System is a Xeon quad-core @ 3.4Ghz, with SSD and 24GB of memory.


Per jjanes's suggestion, I put in the auto_explain. Eventually go this output:

duration: 128057.373 ms 
plan:
Query Text: SET work_mem = '32MB';SELECT        s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset FROM     rm_o_resource_usage_instance_splits_new s INNER JOIN    rm_o_resource_usage r ON s.usage_id = r.id INNER JOIN  scheduledactivities sa ON s.activity_index = sa.activity_index AND r.schedule_id = sa.solution_id and s.solution = sa.solution_id WHERE      r.schedule_id = 12642 ORDER BY   r.resource_id, s.start_date
Sort  (cost=14.36..14.37 rows=1 width=98) (actual time=128042.083..128043.287 rows=21899 loops=1)
  Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
  Sort Key: r.resource_id, s.start_date
  Sort Method: quicksort  Memory: 6585kB
  Buffers: shared hit=21198435 read=388 dirtied=119
  ->  Nested Loop  (cost=0.85..14.35 rows=1 width=98) (actual time=4.995..127958.935 rows=21899 loops=1)
        Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
        Join Filter: (s.activity_index = sa.activity_index)
        Rows Removed by Join Filter: 705476285
        Buffers: shared hit=21198435 read=388 dirtied=119
        ->  Nested Loop  (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)
              Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
              Inner Unique: true
              Join Filter: (s.usage_id = r.id)
              Buffers: shared hit=22102 read=388 dirtied=119
              ->  Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s  (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
                    Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
                    Index Cond: (s.solution = 12642)
                    Buffers: shared hit=203 read=388 dirtied=119
              ->  Seq Scan on public.rm_o_resource_usage r  (cost=0.00..1.29 rows=1 width=57) (actual time=0.002..0.002 rows=1 loops=21899)
                    Output: r.id, r.schedule_id, r.resource_id
                    Filter: (r.schedule_id = 12642)
                    Rows Removed by Filter: 26
                    Buffers: shared hit=21899
        ->  Index Scan using scheduled_activities_idx on public.scheduledactivities sa  (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
              Output: sa.usedresourceset, sa.activity_index, sa.solution_id
              Index Cond: (sa.solution_id = 12642)
              Buffers: shared hit=21176333",,,,,,,,,""

EDIT: Full definitions of the tables are below:

CREATE TABLE public.rm_o_resource_usage_instance_splits_new
(
    start_time integer NOT NULL,
    end_time integer NOT NULL,
    resources jsonb NOT NULL,
    activity_index integer NOT NULL,
    usage_id bigint NOT NULL,
    start_date text COLLATE pg_catalog."default" NOT NULL,
    end_date text COLLATE pg_catalog."default" NOT NULL,
    solution bigint NOT NULL,
    CONSTRAINT rm_o_resource_usage_instance_splits_new_pkey PRIMARY KEY (start_time, activity_index, usage_id),
    CONSTRAINT rm_o_resource_usage_instance_splits_new_solution_fkey FOREIGN KEY (solution)
        REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE,
    CONSTRAINT rm_o_resource_usage_instance_splits_new_usage_id_fkey FOREIGN KEY (usage_id)
        REFERENCES public.rm_o_resource_usage (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_instance_splits_new_activity_idx
    ON public.rm_o_resource_usage_instance_splits_new USING btree
    (activity_index ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_instance_splits_new_solution_idx
    ON public.rm_o_resource_usage_instance_splits_new USING btree
    (solution ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_instance_splits_new_usage_idx
    ON public.rm_o_resource_usage_instance_splits_new USING btree
    (usage_id ASC NULLS LAST)
    TABLESPACE pg_default;
CREATE TABLE public.rm_o_resource_usage
(
    id bigint NOT NULL DEFAULT nextval('rm_o_resource_usage_id_seq'::regclass),
    schedule_id bigint NOT NULL,
    resource_id text COLLATE pg_catalog."default" NOT NULL,
    CONSTRAINT rm_o_resource_usage_pkey PRIMARY KEY (id),
    CONSTRAINT rm_o_resource_usage_schedule_id_fkey FOREIGN KEY (schedule_id)
        REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_idx
    ON public.rm_o_resource_usage USING btree
    (id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX rm_o_resource_usage_sched
    ON public.rm_o_resource_usage USING btree
    (schedule_id ASC NULLS LAST)
    TABLESPACE pg_default;
CREATE TABLE public.scheduledactivities
(
    id bigint NOT NULL DEFAULT nextval('scheduledactivities_id_seq'::regclass),
    solution_id bigint NOT NULL,
    activity_id text COLLATE pg_catalog."default" NOT NULL,
    sequence_index integer,
    startminute integer,
    finishminute integer,
    issue text COLLATE pg_catalog."default",
    activity_index integer NOT NULL,
    is_objective boolean NOT NULL,
    usedresourceset integer DEFAULT '-1'::integer,
    start timestamp without time zone,
    finish timestamp without time zone,
    is_ore boolean,
    is_ignored boolean,
    CONSTRAINT scheduled_activities_pkey PRIMARY KEY (id),
    CONSTRAINT scheduledactivities_solution_id_fkey FOREIGN KEY (solution_id)
        REFERENCES public.rm_o_schedule_stats (id) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX scheduled_activities_activity_id_idx
    ON public.scheduledactivities USING btree
    (activity_id COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX scheduled_activities_id_idx
    ON public.scheduledactivities USING btree
    (id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX scheduled_activities_idx
    ON public.scheduledactivities USING btree
    (solution_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX scheduledactivities_activity_index_idx
    ON public.scheduledactivities USING btree
    (activity_index ASC NULLS LAST)
    TABLESPACE pg_default;

EDIT: Additional output from auto_explain after adding index on scheduledactivities (solution_id, activity_index)

  Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
  Sort Key: r.resource_id, s.start_date
  Sort Method: quicksort  Memory: 6283kB
  Buffers: shared hit=20159117 read=375 dirtied=190
  ->  Nested Loop  (cost=0.85..10.76 rows=1 width=100) (actual time=5.518..122489.627 rows=20761 loops=1)
        Output: s.start_date, s.end_date, s.resources, s.activity_index, r.resource_id, sa.usedresourceset
        Join Filter: (s.activity_index = sa.activity_index)
        Rows Removed by Join Filter: 668815615
        Buffers: shared hit=20159117 read=375 dirtied=190
        ->  Nested Loop  (cost=0.42..5.80 rows=1 width=112) (actual time=0.057..217.563 rows=20761 loops=1)
              Output: s.start_date, s.end_date, s.resources, s.activity_index, s.solution, r.resource_id, r.schedule_id
              Inner Unique: true
              Join Filter: (s.usage_id = r.id)
              Buffers: shared hit=20947 read=375 dirtied=190
              ->  Index Scan using rm_o_resource_usage_instance_splits_new_solution_idx on public.rm_o_resource_usage_instance_splits_new s  (cost=0.42..4.44 rows=1 width=69) (actual time=0.049..17.622 rows=20761 loops=1)
                    Output: s.start_time, s.end_time, s.resources, s.activity_index, s.usage_id, s.start_date, s.end_date, s.solution
                    Index Cond: (s.solution = 12644)
                    Buffers: shared hit=186 read=375 dirtied=190
              ->  Seq Scan on public.rm_o_resource_usage r  (cost=0.00..1.35 rows=1 width=59) (actual time=0.002..0.002 rows=1 loops=20761)
                    Output: r.id, r.schedule_id, r.resource_id
                    Filter: (r.schedule_id = 12644)
                    Rows Removed by Filter: 22
                    Buffers: shared hit=20761
        ->  Index Scan using scheduled_activities_idx on public.scheduledactivities sa  (cost=0.42..4.94 rows=1 width=16) (actual time=0.007..4.654 rows=32216 loops=20761)
              Output: sa.usedresourceset, sa.activity_index, sa.solution_id
              Index Cond: (sa.solution_id = 12644)
              Buffers: shared hit=20138170",,,,,,,,,""

The easiest way to reproduce the issue is to add more values to the three tables. I didn't delete any, only did a few thousand INSERTs.

ChrisMM
  • 8,448
  • 13
  • 29
  • 48
  • I would set up `auto_explain` with `log_min_duration` of about 30s, so that you can capture the execution plan of the query while it is being slow. That would remove an lot of guesswork about what might be going wrong when the query is slow--you can directly see what it is doing when it is being slow. – jjanes Feb 14 '20 at 20:36
  • @jjanes, I've turned it on and will see what happens. I've tried two dozen times since turning it on, and each one runs no problem now! So I'll keep trying to reproduce it, and will update the question when/if I can. – ChrisMM Feb 14 '20 at 21:02
  • @jjanes, and got it to happen again, so updated question – ChrisMM Feb 14 '20 at 21:26
  • `Join Filter: (s.activity_index = sa.activity_index) Rows Removed by Join Filter: 705476285` That is a lot of results beeing thrown away. This should have been an indexed join. Please add definitions for the tables (including PK/FK and secundary indexes to your question. Also run vacuum analyze on the tables. – wildplasser Feb 15 '20 at 13:16
  • 1
    @wildplasser, I've updated the question to include the definitions of the three tables. Copied the SQL from pgAdmin. Also, vacuum analyze runs daily on the entire DB. – ChrisMM Feb 15 '20 at 14:27
  • [Not the answer] dont store date+time in separate fields (and certainly not in text fields. 2 Run vacuum full analyze on the tables 3) what is your setting for random_page_cost? – wildplasser Feb 15 '20 at 15:09
  • @wildplasser, I've never changed it, so it would be the default of 4, I believe. – ChrisMM Feb 15 '20 at 15:37
  • More agressive `autoVACUUM` seems to be the solution IMHO. Are there any `idle_in_transaction` sessions? – wildplasser Feb 15 '20 at 16:01
  • @wildplasser There's occasionally some `idle_in_transactions`, but not often. How would I make the vacuum more aggressive? I've increased its frequency from 24 hours to 12. I can add `FULL`, but I'm thinking that would probably cause too many locks? – ChrisMM Feb 15 '20 at 16:18
  • https://www.2ndquadrant.com/en/blog/when-autovacuum-does-not-vacuum/ – wildplasser Feb 15 '20 at 16:47
  • What happens if you drop index `scheduled_activities_idx`? Since it thinks it will hit that index only once, and that it will return only one row when it does hit it, the planner sees no point in trying to optimize by using the index with an extra column, as it thinks just the first one will be unique. Both of those estimations are extremely wrong, however. – jjanes Feb 15 '20 at 17:58
  • @jjanes, There are other links to the table which only use `solution_id`, instead of both `solution_id, activity_index`, would removing that index not affect those queries? Nevertheless, I tried it, and still get the same thing. Still says `rows=1` – ChrisMM Feb 15 '20 at 19:29
  • They can keep using the new index which has solution_id as a prefix. I wouldn't expect any of the `rows=1` to change, I would expect the `rows=32216` to change, as well as the performance. – jjanes Feb 15 '20 at 19:38
  • @jjames, sorry, not really familiar with how to read the output from the explain function. this is what it said `> Nested Loop (cost=0.85..16.29 rows=1 width=100) (actual time=0.544..129614.745 rows=20761 loops=1)` – ChrisMM Feb 15 '20 at 19:47
  • The statistics are maintained by the vacuum daemon. Which is too inactive. It also maintains the free-space map (used by the bitmap scan) – wildplasser Feb 15 '20 at 19:49

2 Answers2

2
->  Index Scan using .. s  (cost=0.42..8.44 rows=1 width=69) (actual time=0.082..17.418 rows=21899 loops=1)
                    Index Cond: (s.solution = 12642)

The planner thinks it will find 1 row, and instead finds 21899. That error can pretty clearly lead to bad plans. And a single equality condition should be estimated quite accurately, so I'd say the statistics on your table are way off. It could be that the autovac launcher is tuned poorly so it doesn't run often enough, or it could be that select parts of your data change very rapidly (did you just insert 21899 rows with s.solution = 12642 immediately before running the query?) and so the stats can't be kept accurate enough.

  ->  Nested Loop ...
        Join Filter: (s.activity_index = sa.activity_index)
        Rows Removed by Join Filter: 705476285
        ->  ...
        ->  Index Scan using scheduled_activities_idx on public.scheduledactivities sa  (cost=0.42..4.60 rows=1 width=16) (actual time=0.006..4.612 rows=32216 loops=21899)
              Output: sa.usedresourceset, sa.activity_index, sa.solution_id
              Index Cond: (sa.solution_id = 12642)

If you can't get it to use the Hash Join, you can at least reduce the harm of the Nested Loop by building an index on scheduledactivities (solution_id, activity_index). That way the activity_index criterion could be part of the Index Condition, rather than being a Join Filter. You could probably then drop the index exclusively on solution_id, as there is little point in maintaining both indexes.

jjanes
  • 37,812
  • 5
  • 27
  • 34
  • The vacuum we use is set to launch immediately upon our application launching. In the case when users leave the application always open, then it runs every 24 hours (`VACUUM ANALYZE` is the exact command). I did insert a few thousand rows prior to getting that result; however, the table already had several hundred of thousand rows. These three tables will actually have thousands of inserts and deletes per day, then will be idle when doing the above query. The query usually works fine, then suddenly takes a long time. I can try increasing the frequency of vacuum analyze and I'll add that index. – ChrisMM Feb 15 '20 at 15:54
  • Just to clarify, at any time there's usually at least 50,000 entries in the tables. – ChrisMM Feb 15 '20 at 16:00
  • @ChrisMM If you inserted several thousand rows all with the same value and you will then query for that value, then you have changed the data distribution very substantially, even if there were already a million rows pre-existing rows none or few of which contained that value. Also, I had assumed your tables had millions of rows. If you just inserted 21899 rows into a table with only 50000, auto-analyze should kick in quite quickly, unless you have it disabled or configured strangely. – jjanes Feb 15 '20 at 17:53
  • 50k is the minimum there would be. You're correct, there's generally millions. The auto-analyze is the default value after an install, I've never changed it. I know the data did change in this case, but just doing inserts shouldn't make the DB think there is only one row? Seems odd that the planner expects one match in a table with hundreds of thousands or millions of rows. Or maybe I am misunderstanding how the planner works. The issue does also occur if I don't make changes to the data. It is just easier to reproduce by adding more data. – ChrisMM Feb 15 '20 at 19:08
  • The question is, how many rows were there with `s.solution = 12642` at the time of the last ANALYZE? I am guessing that the database (since the last analyze) thought there was one (or zero) rows with `s.solution = 12642` and this was accurate at the time. Back when that was accurate, the plan it chose would have been OK. Inserting more rows with skewed data didn't change the plan structure (as it is using old stats), it just made that plan become slow to execute. It is possible but less likely that you hit a problem with the ANALYZE and there never was just one row meeting that criterion. – jjanes Feb 15 '20 at 19:43
  • Thanks for the explanation, I think I understand now. Short of running `vacuum analyze` a lot more often, is there a way to trick it into changing its plan? To usually assume that there's more than one row? – ChrisMM Feb 15 '20 at 19:49
  • Sorry, I can't think of a way for this query to trick it into doing the right thing. – jjanes Feb 23 '20 at 00:57
  • No worries. I've added a check prior to the query to analyze those three tables as needed. I put in `SELECT n_tup_ins FROM pg_stat_user_tables WHERE relname = 'scheduledactivities'`, and if `n_tup_ins` has changed since the last call, then it runs `analyze scheduledactivities; analyze rm_o_resource_usage; analyze rm_o_resource_usage_instance_splits_new;`. It adds only about .5 seconds when it has to analyze, which is much better than the 2+ minutes I was getting. – ChrisMM Feb 24 '20 at 11:54
0

The SQL statement of the fast plan is using WHERE r.schedule_id = 10 and returns about 22000 rows (with estimated 105). The SQL statement of the slow plan is using WHERE r.schedule_id = 12642 and returns about 21000 rows (with estimated only 1).

The slow plan is using nested loops instead of hash joins: maybe because there is a bad estimation for joins: estimated rows is 1 but actual rows is 21899. For example in this step:

Nested Loop  (cost=0.42..9.74 rows=1 width=110) (actual time=0.091..227.705 rows=21899 loops=1)

If data does not change there is maybe a statistic issue (skew data) for some columns.

pifor
  • 7,419
  • 2
  • 8
  • 16
  • Sorry, perhaps some poor phrasing in the question by me. The tables data can change, but in the case of the randomness, it doesn't need to change in order to randomly be slower. Between the two example outputs, I was using different DBs. For the second output, I ran the same query from the application around 30 times before I managed to get it to reproduce, without the data in the relevant tables changing. – ChrisMM Feb 15 '20 at 14:21
  • I didn't notice the difference at first, but yes, the second one does seem to use Nested Loop, but I'm not sure why it would. – ChrisMM Feb 15 '20 at 14:28