How can I replace this correlated subquery within a function call?

Question

Given the following tables

buckets

metric_id|start_date         |bucket
------------------------------------
a        |2019-12-05 00:00:00|1
a        |2019-12-06 00:00:00|2
b        |2021-10-31 00:00:00|1
b        |2021-11-01 00:00:00|2

points

point_id|metric_id|timestamp
----------------------------
1       |a        |2019-12-05 00:00:00
2       |a        |2019-12-06 00:00:00
3       |b        |2021-10-31 00:00:00
4       |b        |2021-11-01 00:00:00

And the following query

select
       p.metric_id,
       bucket
from points p
left join width_bucket(p.timestamp, (select array(select start_date
                                                  from buckets b
                                                  where b.metric_id = p.metric_id -- correlated sub-query
                                                  ))) as bucket on true

Output

metric_id|bucket
-----------------
a        |1
a        |2
b        |1
b        |2

How can I remove the correlated sub-query to improve the performance?

Currently ~280,000 points * ~650 buckets = ~180,000,000 loops = very slow!

Basically I want to remove the correlated sub-query and apply the width_bucket function only once per unique metric_id in buckets, so that the performance is improved and the function is still given the correct time series data.

How can this be done in Postgres 13?

you need to provide your execuation plan : `EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)` — eshirvana, Nov 03 '21 at 18:16

eshirvana · Answer 1 · 2021-11-03T18:15:30.627

0

you can rewrite your query :

select
    p.metric_id,
    width_bucket(p.timestamp,array_agg(b.start_date)) bucket
from points p
left join buckets b on b.metric_id = p.metric_id
group by p.metric_id, p.timestamp

also adding index on buckets.start_date & points ( metric_id, timestamp) would help alot.

edited Nov 03 '21 at 18:15

answered Nov 03 '21 at 18:10

eshirvana

23,227
3
22
38

score 0 · Accepted Answer · answered Nov 03 '21 at 18:26

You can use a cte to aggregate buckets first

with buckets_arr as (
   select  metric_id, array_agg(start_date order by start_date) arrb
   from buckets
   group by metric_id
)
select
       p.metric_id,
       width_bucket(p.timestamp, ba.arrb) bucket
from points p
join buckets_arr ba on p.metric_id = ba.metric_id

How can I replace this correlated subquery within a function call?

2 Answers2