0

I have a query which is generating a trend response and getting me counts of devices for various dates. The query is going over almost 500k rows. The table has almost 17.5 million records. I have partitioned the table based on id so that it can only look for a specific partition but still it is quite slow. Each partition has almost 200k records. Any idea how to improve the performance of this.

Query

select start_date,end_date,average, fail_count, warning_count,pass_count, average from
   (
   select generate_series(timestamp '2021-01-18 00:00:00', timestamp '2021-02-12 00:00:00', interval  '1 day')::date) t(start_date)  
   LEFT  JOIN (
      SELECT  start_date, end_date, avg(score) as average 
        , count(*) FILTER (WHERE status = 'Fail')    AS fail_count
    , count(*) FILTER (WHERE status = 'Warning') AS warning_count
    , count(*) FILTER (WHERE status = 'Pass')    AS pass_count        
   FROM   performance.tenant_based scd  join performance.hierarchy dh on dh.id = scd.id and dh.tag = scd.tag 
    where dh.parent_id in (0,1,2,3,4,5,6,7,8,9,10) and dh.child_id in (0,1,2,3,4,5,6,7,8,9,10) and dh.desc in ('test') 
    and dh.id ='ita68f0c03880e4c6694859dfa74f1cdf6' AND  start_date >= '2021-01-18 00:00:00'  -- same date range as above
    AND    start_date <= '2021-02-12 00:00:00'    
   GROUP  BY 1,2
   ) s USING (start_date)
ORDER  BY 1;

The Query plan is below

Sort  (cost=241350.02..241850.02 rows=200000 width=104) (actual time=3453.888..3453.890 rows=26 loops=1)
  Sort Key: (((((generate_series('2021-01-18 00:00:00'::timestamp without time zone, '2021-02-12 00:00:00'::timestamp without time zone, '1 day'::interval)))::date))::timestamp without time zone)
  Sort Method: quicksort  Memory: 28kB
  ->  Merge Left Join  (cost=201014.95..212802.88 rows=200000 width=104) (actual time=2901.012..3453.867 rows=26 loops=1)
        Merge Cond: ((((generate_series('2021-01-18 00:00:00'::timestamp without time zone, '2021-02-12 00:00:00'::timestamp without time zone, '1 day'::interval)))::date) = scd.start_date)
        ->  Sort  (cost=79.85..82.35 rows=1000 width=4) (actual time=0.015..0.024 rows=26 loops=1)
              Sort Key: (((generate_series('2021-01-18 00:00:00'::timestamp without time zone, '2021-02-12 00:00:00'::timestamp without time zone, '1 day'::interval)))::date)
              Sort Method: quicksort  Memory: 26kB
              ->  Result  (cost=0.00..20.02 rows=1000 width=4) (actual time=0.003..0.009 rows=26 loops=1)
                    ->  ProjectSet  (cost=0.00..5.02 rows=1000 width=8) (actual time=0.002..0.006 rows=26 loops=1)
                          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
        ->  Materialize  (cost=200935.11..209318.03 rows=40000 width=72) (actual time=2900.992..3453.789 rows=25 loops=1)
              ->  Finalize GroupAggregate  (cost=200935.11..208818.03 rows=40000 width=72) (actual time=2900.990..3453.771 rows=25 loops=1)
                    Group Key: scd.start_date, scd.end_date
                    ->  Gather Merge  (cost=200935.11..207569.38 rows=49910 width=72) (actual time=2879.365..3453.827 rows=75 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          ->  Partial GroupAggregate  (cost=199935.08..200808.51 rows=24955 width=72) (actual time=2686.465..3228.313 rows=25 loops=3)
                                Group Key: scd.start_date, scd.end_date
                                ->  Sort  (cost=199935.08..199997.47 rows=24955 width=25) (actual time=2664.518..2860.477 rows=1666667 loops=3)
                                      Sort Key: scd.start_date, scd.end_date
                                      Sort Method: external merge  Disk: 59840kB
                                      ->  Hash Join  (cost=44891.11..198112.49 rows=24955 width=25) (actual time=111.653..1817.228 rows=1666667 loops=3)
                                            Hash Cond: (scd.tag = (dh.tag)::text)
                                            ->  Append  (cost=0.00..145159.33 rows=2083333 width=68) (actual time=0.006..591.818 rows=1666667 loops=3)
                                                  ->  Parallel Seq Scan on ita68f0c03880e4c6694859dfa74f1cdf6 scd  (cost=0.00..145159.33 rows=2083333 width=68) (actual time=0.006..455.525 rows=1666667 loops=3)
                                                        Filter: ((start_date >= '2021-01-18 00:00:00'::timestamp without time zone) AND (start_date <= '2021-02-12 00:00:00'::timestamp without time zone) AND ((id)::text = 'ita68f0c03880e4c6694859dfa74f1cdf6'::text))
                                            ->  Hash  (cost=44638.71..44638.71 rows=20192 width=45) (actual time=111.502..111.502 rows=200000 loops=3)
                                                  Buckets: 65536 (originally 32768)  Batches: 8 (originally 1)  Memory Usage: 3585kB
                                                  ->  Bitmap Heap Scan on hierarchy dh  (cost=1339.01..44638.71 rows=20192 width=45) (actual time=26.542..62.078 rows=200000 loops=3)
                                                        Recheck Cond: (((id)::text = 'ita68f0c03880e4c6694859dfa74f1cdf6'::text) AND (parent_id = ANY ('{0,1,2,3,4,5,6,7,8,9,10}'::integer[])) AND (child_id = ANY ('{0,1,2,3,4,5,6,7,8,9,10}'::integer[])) AND ((desc)::text = 'test'::text))
                                                        Heap Blocks: exact=5717
                                                        ->  Bitmap Index Scan on hierarchy_id_region_idx  (cost=0.00..1333.96 rows=20192 width=0) (actual time=25.792..25.792 rows=200000 loops=3)
                                                              Index Cond: (((id)::text = 'ita68f0c03880e4c6694859dfa74f1cdf6'::text) AND (parent_id = ANY ('{0,1,2,3,4,5,6,7,8,9,10}'::integer[])) AND (child_id = ANY ('{0,1,2,3,4,5,6,7,8,9,10}'::integer[])) AND ((desc)::text = 'test'::text))
Planning time: 0.602 ms
Execution time: 3463.440 ms
Ash
  • 54
  • 5
  • Can you export an anonymized version of the data? – bobflux Mar 02 '21 at 18:10
  • No. The tables have only text,varchar,int4 and timestamp. First table has 16 columns and second table has 15 columns. First table is partitioned, second table is not. – Ash Mar 02 '21 at 19:06
  • Do you have indexes on `status`, `service_tag` and `desc` ? – IVO GELOV Mar 02 '21 at 19:26
  • When using parameters for generate_series that are not integers, the planner gets the row estimates wrong. In your case, it expects 1000 rows which is the usual default for function scan, but it gets only 26 rows. More details [here](https://stackoverflow.com/questions/53239669/postgresql-generate-date-series-performance/66279403#66279403). Maybe if you rewrite it with integer parameters as per previous link, it could come up with a better plan. – bobflux Mar 02 '21 at 19:46
  • https://explain.depesz.com/s/s1Xt – bobflux Mar 02 '21 at 19:52
  • With depesz.com it's easier to see the problem is the sort. Why is it using merge? Is work_mem set to a very low value? – bobflux Mar 02 '21 at 19:53
  • Also the row estimates on the aggregate are very wrong, that's why it is using a sort instead of a HasgAggregate which would be a lot faster... – bobflux Mar 02 '21 at 19:55
  • Did you censor the "Rows Removed by Filter" lines from your execution plans, or are there really none of them? – jjanes Mar 02 '21 at 20:44
  • "Each partition has almost 200k records" But it is finding 1,666,667 records in 1/3 of the partition, so that is 25 times more than 200k (and an oddly round number to boot). – jjanes Mar 02 '21 at 20:46

1 Answers1

1

After going through several trail and errors we landed on the materialized view for this query. The number of rows the query was scanning was almost 500k+ and no indexes and partitioning was helping. We tweaked the above query to create a Materialized view and then doing a select on top of it. We are now at 96ms. The generalized materialized view for the query in this question is shown below.

CREATE MATERIALIZED VIEW performance.daily_trends
TABLESPACE pg_default
AS SELECT s.id,
    d.parent_id,
    d.child_id,  
    d.desc,
    s.start_date,
    s.end_date,
    count(*) FILTER (WHERE s.overall_status::text = 'Fail'::text) AS fail_count,
    count(*) FILTER (WHERE s.overall_status::text = 'Warning'::text) AS warning_count,
    count(*) FILTER (WHERE s.overall_status::text = 'Pass'::text) AS pass_count,
    avg(s.score) AS average_score
   FROM performance.tenant_based  s
     JOIN performance.hierarchy  d ON s.id::text = d.id::text AND s.tag = d.tag::text
  WHERE s.start_date >= (CURRENT_DATE - 45) AND s.start_date <= CURRENT_DATE
  GROUP BY s.id, d.parent_id, d.child_id, d.desc, s.start_date, s.end_date
WITH DATA;

Thanks for all who tried helping on this.

Ash
  • 54
  • 5