-1

I am looking to figure out if I can solve the following problem in SQL, or if I'm better off selecting the values into my scripting language and just bulk update from there.

There are some points in time, and there are some time intervals defined by the center of the time interval and a maximum duration from the center, let it be 10 minutes for all of them. Centers may be at any duration from each other, points may be at any duration from each other. Looking to select all time intervals, together with one or zero points, so that each point is either not assigned or assigned to only one interval. If one point matches more than one interval, or vice versa, points shall be chosen so that the total duration between points and interval centers is minimized.

Sample data

interval
id centertime
1 2001-01-01 12.00     # starts at 11.50 ends at 12.10
2 2001-01-01 12.15     # starts at 12.05 ends at 12.25
3 2001-01-01 12.20     # starts at 12.10 ends at 12.30

point
id time
21 2001-01-01 12.00     
22 2001-01-01 12.11
23 2001-01-01 12.17
24 2001-01-01 12.19

Desired results:

interval_id point_id
1 21
2 23
3 24

Explanation

Point 21 exactly matches center of interval 1, and nothing else, so is assigned.

Point 23 is closer to interval 2 than 3, but point 24 is even closer to 3, so interval 3 is assigned point 24.

Point 22 is the closest remaining point to interval 2, so is assigned.

Point 21 is within interval 2, but point 22 is available and closer, so 21 is not assigned to an interval and does not appear in results.

point 23 is even closer to 3, so 22 is the closest remaining one

cmc
  • 4,294
  • 2
  • 35
  • 34
  • 2
    Sample data, desired results, a clear explanation (time stamps, time periods, overlaps . . . it's quite confusing) and an appropriate database tag would all help. – Gordon Linoff Jul 01 '19 at 12:14
  • @GordonLinoff Edited as to your valued feedback. Problem is not trivial but I hope it is clearly defined now. – cmc Jul 01 '19 at 12:57
  • If it wasn't for the "*points should be taken to minimize the total duration*" you could use something like this: https://rextester.com/XXACM88895 –  Jul 01 '19 at 13:48
  • @a_horse_with_no_name very cool, I had no idea about lateral joins. I modified your cool query so it would do what I want if all lines that contain a repeated interval-id were removed. update/on-conflict might work but can it be done in the select? https://rextester.com/LVEL85019 – cmc Jul 01 '19 at 14:58
  • @a_horse_with_no_name got a solution combining your lateral join with window functions to get rid of unneeded matches. Thanks!!! – cmc Jul 02 '19 at 16:33

1 Answers1

0

Okay I got it.

It uses a lateral join to calculate the duration from each centertime to each point, and wraps that in an additional SELECT to get only the closest match using an ordered window function.

SELECT * FROM (
    SELECT
        i.id as interval_id,
        p.id as point_id,
        p.duration,
        ROW_NUMBER() OVER (PARTITION BY i.id ORDER BY duration) AS rownumber
    FROM "interval" i
    LEFT JOIN LATERAL (
        SELECT
            p1.id,
            p1."time",
            ABS(EXTRACT(EPOCH FROM i.centertime - p1."time")) as duration
        FROM "point" p1
        WHERE p1."time"
            BETWEEN i.centertime - interval '10 minute'
            AND i.centertime + interval '10 minute'
     ) p on true)
     AS q
WHERE rownumber=1;
cmc
  • 4,294
  • 2
  • 35
  • 34
  • Didn't got it after all- I ended up with a solution combining this approach with a recursive CTE in order not to repeat assignments. – cmc Aug 15 '19 at 12:44