0

Given the following tables

buckets

metric_id|start_date         |bucket
------------------------------------
a        |2019-12-05 00:00:00|1
a        |2019-12-06 00:00:00|2
b        |2021-10-31 00:00:00|1
b        |2021-11-01 00:00:00|2

points

point_id|metric_id|timestamp
----------------------------
1       |a        |2019-12-05 00:00:00
2       |a        |2019-12-06 00:00:00
3       |b        |2021-10-31 00:00:00
4       |b        |2021-11-01 00:00:00

And the following query

select
       p.metric_id,
       bucket
from points p
left join width_bucket(p.timestamp, (select array(select start_date
                                                  from buckets b
                                                  where b.metric_id = p.metric_id -- correlated sub-query
                                                  ))) as bucket on true

Output

metric_id|bucket
-----------------
a        |1
a        |2
b        |1
b        |2

How can I remove the correlated sub-query to improve the performance?

Currently ~280,000 points * ~650 buckets = ~180,000,000 loops = very slow!

Basically I want to remove the correlated sub-query and apply the width_bucket function only once per unique metric_id in buckets, so that the performance is improved and the function is still given the correct time series data.

How can this be done in Postgres 13?

eshirvana
  • 23,227
  • 3
  • 22
  • 38
Dan
  • 3
  • 1

2 Answers2

0

you can rewrite your query :

select
    p.metric_id,
    width_bucket(p.timestamp,array_agg(b.start_date)) bucket
from points p
left join buckets b on b.metric_id = p.metric_id
group by p.metric_id, p.timestamp

also adding index on buckets.start_date & points ( metric_id, timestamp) would help alot.

eshirvana
  • 23,227
  • 3
  • 22
  • 38
0

You can use a cte to aggregate buckets first

with buckets_arr as (
   select  metric_id, array_agg(start_date order by start_date) arrb
   from buckets
   group by metric_id
)
select
       p.metric_id,
       width_bucket(p.timestamp, ba.arrb) bucket
from points p
join buckets_arr ba on p.metric_id = ba.metric_id
Serg
  • 22,285
  • 5
  • 21
  • 48