Optimizing a query with JSONB aggregations and CTE

Question

I have an SQL query that I'm trying to speed up by adding an index or changing the query itself.

However, the execution plan says I have slow HashAggregate and Sort operations.

Here's the visualised execution plan: https://explain.dalibo.com/plan/09c7d4a2dgcc29ec

Here's my table DDL:

CREATE TABLE notification
(
    id        bigint                   NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    body      jsonb                    NOT NULL,
    "type"    text                     NOT NULL,
    userid    int                      NOT NULL,
    isread    boolean                  NOT NULL DEFAULT FALSE,
    platform  text                     NOT NULL,
    "state"   text                     NOT NULL,
    createdat timestamp WITH TIME ZONE NOT NULL DEFAULT NOW(),
    updatedat timestamp WITH TIME ZONE NOT NULL DEFAULT NOW(),
    CONSTRAINT notification_pkey PRIMARY KEY (id, platform),
    CONSTRAINT fk_notification_userid FOREIGN KEY (userid) REFERENCES "user" (id)
) PARTITION BY LIST (platform);

CREATE TABLE notification_a PARTITION OF notification FOR VALUES IN ('a');
CREATE TABLE notification_b PARTITION OF notification FOR VALUES IN ('b');
CREATE TABLE notification_default PARTITION OF notification DEFAULT;

And here's my sql query:

--EXPLAIN (ANALYZE, COSTS, BUFFERS, VERBOSE, FORMAT TEXT)
WITH grouped AS (
    SELECT CASE

               WHEN type = 'newContent' AND COUNT(*) FILTER (WHERE type = 'newContent') OVER () >= 3
                   THEN
                           JSONB_AGG(
                           JSONB_BUILD_OBJECT(
                                   'id', id,
                                   'body', body,
                                   'createdAt', createdat,
                                   'type', type
                               )
                       ) FILTER (WHERE type = 'newContent') OVER ()

               ELSE
                   JSONB_BUILD_OBJECT(
                           'id', id,
                           'body', body,
                           'createdAt', createdat,
                           'type', type
                       )
               END "notification"
    FROM notification_a
    WHERE userid = 23053
      AND NOT isread
      AND state = 'sent'
    ORDER BY createdat DESC
)
SELECT CASE

           WHEN JSONB_TYPEOF(notification) = 'array' AND notification #>> '{0,type}' = 'newContent' THEN
               JSONB_BUILD_OBJECT(
                       'body', JSONB_BUILD_OBJECT(
                       'contentCount', JSONB_ARRAY_LENGTH(notification),
                       'department', notification #>> '{0,body,department}'
                   ),
                       'createdAt', NOW(),
                       'type', 'newContentCompact',
                       'group', notification
                   )

           ELSE notification
           END,
       COUNT(*) OVER () total_count
FROM grouped
GROUP BY notification
ORDER BY (notification ->> 'createdAt')::timestamptz DESC
OFFSET 0 LIMIT 100;

Firstly, I partitioned the table by the platform key because the platform a has much larger number of notifications than platform b.

Then, I tried different index configurations and stopped on the one below giving the best performance gain:

CREATE INDEX IF NOT EXISTS notification_search_index ON notification (userid, isread, state, createdat DESC) WHERE (state = 'sent' and not isread);

However, I couldn't get rid of the sorting operation on notification_a.createdat key despite having it in the index.

The table is currently ~10 million rows and in future will have 100s of millions rows.

I realise that it might be the query itself that is not very "optimisable", but I couldn't find a better way to achieve what I need:

Get paginated notifications for a specific platform
if there're more than 2 notifications of the same type (and if this type is newContent), they should be grouped together in the final result (hence the usage of JSONB_AGG). The only really needed values for the group are type, body ->> 'department' and the size of the group.

For example, if I have rows

id: 1, type: 'a'
id: 2, type: 'a'
id: 3, type: 'a'
id: 4, type: 'b'
id: 5, type: 'c'
id: 6, type: 'd'

and if a do a SELECT ... OFFSET 0 LIMIT 3 I should receive 3 rows:

count: 3, type: 'a'
id: 4, type: 'b'
id: 5, type: 'c'

Could you help me optimise my query and probably find a better (more performant) way to achieve what I need? Resulting jsonb is not that important, I can build json on the application level, that's just the only way I found to make it do the grouping.

Update 1

Running ALTER ROLE current_user SET work_mem TO '128MB'; gave ~x2 speed up.

Here's the new execution plan: https://explain.dalibo.com/plan/e231h440b2943ga2

Sort Method: external merge Disk: 69344kB: Your work_mem is too small. What is your current setting and what happens when you increase work_mem? — Frank Heikens, Mar 16 '23 at 14:35
@FrankHeikens increasing work_mem to 128mb gave the x2 speed up. Previously it was the default 4mb. (added `Update 1` at the end of the question with the new execution plan). Could you please tell me if it is safe to run `ALTER ROLE current_user SET work_mem TO '128MB';` in production? Are there any consequences or caveats one should know about? — Denis Yakovenko, Mar 16 '23 at 14:53

Optimizing a query with JSONB aggregations and CTE

Update 1

0 Answers0