1

We have postgresql12 and have a large table of size 108gb including index. Since, the query got slow, we tried partitioning the table. But this did not help.

    EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
SELECT
    val_a AS x,
    val_b AS y,
    SUM(value) AS value
FROM
    test_table_tp
WHERE
    client = '767jjDHIPLkshj'
    AND identity_1 = '12edfdijijasd'
    AND identity_2 = '98jjaskhuUUHss'
    AND identity_3 = 1
    AND date_col BETWEEN '2021-04-01'::date AND ('2021-07-01'::date + Interval '1 day')::date
GROUP BY
    val_a,
    val_b;

This takes around 3 minutes to execute.

->  HashAggregate  (cost=260438.41..260718.30 rows=27989 width=16) (actual time=214427.239..214427.614 rows=1298 loops=1)
               Output: test_table_tp.val_a, test_table_tp.val_b, sum(test_table_tp.value)
               Group Key: test_table_tp.val_a, test_table_tp.val_b
               Buffers: shared hit=216750 read=331936
               I/O Timings: read=211977.843
               ->  Append  (cost=0.56..258339.24 rows=279890 width=12) (actual time=3.057..213976.334 rows=722233 loops=1)
                     Buffers: shared hit=216750 read=331936
                     I/O Timings: read=211977.843
                     ->  Index Scan using idx_202104_108521 on public.test_table_tp_p2021_04_108521 test_table_tp  (cost=0.56..71154.06 rows=83535 width=12) (actual time=3.056..69033.315 rows=216908 loops=1)
                           Output: test_table_tp.val_a, test_table_tp.val_b, test_table_tp.value
                           Index Cond: (((test_table_tp.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp.identity_3 = 1) AND (test_table_tp.date_col >= '2021-04-01'::date) AND (test_table_tp.date_col <= '2021-07-02'::date))
                           Buffers: shared hit=68990 read=96437
                           I/O Timings: read=68466.167
                     ->  Index Scan using idx_202105_108553 on public.test_table_tp_p2021_05_108553 test_table_tp_1  (cost=0.56..51361.84 rows=55441 width=12) (actual time=8.641..55685.999 rows=160618 loops=1)
                           Output: test_table_tp_1.val_a, test_table_tp_1.val_b, test_table_tp_1.value
                           Index Cond: (((test_table_tp_1.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_1.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_1.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_1.identity_3 = 1) AND (test_table_tp_1.date_col >= '2021-04-01'::date) AND (test_table_tp_1.date_col <= '2021-07-02'::date))
                           Buffers: shared hit=48314 read=69911
                           I/O Timings: read=55277.406
                     ->  Index Scan using idx_202106_108585 on public.test_table_tp_p2021_06_108585 test_table_tp_2  (cost=0.56..63581.82 rows=66779 width=12) (actual time=2.870..48249.339 rows=188842 loops=1)
                           Output: test_table_tp_2.val_a, test_table_tp_2.val_b, test_table_tp_2.value
                           Index Cond: (((test_table_tp_2.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_2.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_2.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_2.identity_3 = 1) AND (test_table_tp_2.date_col >= '2021-04-01'::date) AND (test_table_tp_2.date_col <= '2021-07-02'::date))
                           Buffers: shared hit=54983 read=90249
                           I/O Timings: read=47732.316
                     ->  Index Scan using idx_202107_108617 on public.test_table_tp_p2021_07_108617 test_table_tp_3  (cost=0.56..70842.08 rows=74135 width=12) (actual time=2.849..40902.561 rows=155865 loops=1)
                           Output: test_table_tp_3.val_a, test_table_tp_3.val_b, test_table_tp_3.value
                           Index Cond: (((test_table_tp_3.client)::text = '767jjDHIPLkshj'::text) AND ((test_table_tp_3.identity_1)::text = '12edfdijijasd'::text) AND ((test_table_tp_3.identity_2)::text = '98jjaskhuUUHss'::text) AND (test_table_tp_3.identity_3 = 1) AND (test_table_tp_3.date_col >= '2021-04-01'::date) AND (test_table_tp_3.date_col <= '2021-07-02'::date))
                           Buffers: shared hit=44463 read=75339
                           I/O Timings: read=40501.954
             Planning Time: 18.081 ms
             Execution Time: 214427.963 ms
 Planning Time: 0.083 ms
 Execution Time: 214461.427 ms
(41 rows)

Time: 214462.391 ms (03:34.462)

This is already an aggregated table which contains monthly aggregated data. Further aggregation is also not possible as there are many filters that can be applied. The identity columns have unique key in the table.

Can anything be done to further optimize this? Any help is greatly appreciated. Thanks you in advance.

  • This is pretty much completely I/O bound, so faster and more disks would help. Or (much) more RAM so that the data can be cached –  Sep 09 '21 at 05:19
  • @a_horse_with_no_name - Our disc has 16000 IOPS with max throughput of 1000 MB/s and we have machine with 128GB RAM. Checking the usage none of these have hit the max limit. – success malla Sep 09 '21 at 05:25
  • Any idea why you are not getting parallel query here? – jjanes Sep 09 '21 at 06:23
  • How big can 1 IO be? If 8kb counts as 8 IO, you would be quite close to the specified IOPS. – jjanes Sep 09 '21 at 06:29
  • @jjanes - yeah we are planning to increase the IOPS now to see if it improves the speed. regarding parallel query we think its the decision of the query planner to not to use parallel query – success malla Sep 09 '21 at 06:48

1 Answers1

1

This query will become slower with partitioning, although not much.

Apart from getting a faster disk or more RAM, the only ways to speed up this query are

  1. include val_a, val_b and value into the index and VACUUM the table, so that you get a fast index-only scan

    CREATE INDEX ON test_table_tp (
       identity_1,
       identity_2,
       identity_3,
       date_col,
       val_a,
       val_b,
       value
    );
    
  2. CLUSTER the table using the index, so that the rows are stored in fewer blocks

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • We have a unique index with all the columns to support filters that can be applied. Regarding the Cluster table, thank you, will try it out to see if it helps. – success malla Sep 09 '21 at 06:52
  • The suggestion was to include some more columns. – Laurenz Albe Sep 09 '21 at 06:59
  • Thank you, and yeah we already have it in our unique Index. But did not help – success malla Sep 09 '21 at 07:06
  • You do *not* get an index-only scan? – Laurenz Albe Sep 09 '21 at 07:22
  • ya the query planner chose the other index. I am now thinking about droping the index currently used and create a new index with all the columns in the query and see if it would help. Thank you for showing the way. There is still a question on what would happen if filters are applied, eg when identity_4 column is used in the where clause. Regarding Clustering the table, I am afraid it would not be possible as millions of upserts are executed each day and we cannot stop reads or writes. :( – success malla Sep 09 '21 at 07:53
  • 1
    @successmalla What is the unique index definition? It isn't enough for all the columns to be present, they must also be in the correct order. First equality-tested ones, then the range tested one, then the others. – jjanes Sep 09 '21 at 13:01
  • @jjanes - Got it the order might have been the culprit here. Will test this out tomorrow when the database load is low. Thank you. :) – success malla Sep 09 '21 at 16:21
  • @LaurenzAlbe - index only scan worked. Thanks a lot. Whatever the case, trying out adding faster disc next. – success malla Sep 10 '21 at 03:18