2

I'm building a scheduling system where I store an initial appointment and how often it repeats. My table looks something like this:

CREATE TABLE (
    id serial primary key,
    initial_timestamp timestamp not null,
    recurring interval
);

id        initial_timestamp          recurring
27        2020-06-02                 3 weeks
24        2020-06-03                 10 days

Assuming I can handle the time component, and that the only intervals we'll run across are days and weeks, how can I find the when those two appointments will overlap? For example, the previous example will overlap on June 23rd. It's 3 weeks from June 2nd and 20 days from June 3rd, so the first appointment will repeat once on that day and the second appointment will repeat on the 13th and then the 23rd.

In my program, I have another date, say June 7th with a recurring interval of 12 days. What query can I use to find the time it will take for a recurring appointment starting on June 7th to overlap with every existing recurring appointment? So for example, this appointment will repeat on June 19, July 1, and July 13. Appointment #24 from the table above will repeat on June 13, June 23, July 3, and July 13, if my math is right. I'd like my query comparing this appointment to appointment #24 to return, first of all, July 13th, then also how long it would take to repeat again, which I assume would be like finding the least common multiple of the two intervals, in this case, 60 days (LCM of 12 and 10). So I could expect it to repeat again on July 13 + 60 days = Sept 11.

I tried using generate_series, but since I don't know the size of the intervals, the series would have to continue infinitely, right? It's probably not the best choice here. I assume the answer would have more to do with the math of multiplying intervals somehow.

Note that recurring can be null, so I'd assume there has to be something like WHERE recurring IS NOT NULL in there somewhere. Another thing to note: no initial appointments overlap. I've already guarded against that. The search term doesn't overlap with any of the appointment's initial times either.

If it helps at all, I'm using PHP 5.3 to send queries to Postgres 9.4 (I know, it's an ancient setup). I'd prefer to do most of this in SQL just because most of the other logic is in SQL right now, so I can just run the query and start manipulating the results with PHP.

So in summary, if my math is right, what Postgres query should I use with the table above to compare a given date and interval with every date and interval pair from the table to find the next date those two overlap and how far apart each overlap instance would be?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Jaden Baptista
  • 656
  • 5
  • 16
  • What would happen to an initial appointment on 2020-01-31 and an interval of 1 month, anyway? (In Postgres: `SELECT timestamp '2020-01-31 0:0' + interval '1 month'; -- 2020-02-29 00:00:00`) If you want to exclude months, then please edit to remove it from the question. Instead, add an exact table definition (`CREATE TABLE` statement) showing data types and constraints, and (always) your version of Postgres. – Erwin Brandstetter Jun 03 '20 at 22:13
  • Also: `the previous example will overlap on June 23rd` - what about the *time* component, which does not match? – Erwin Brandstetter Jun 03 '20 at 22:20
  • @ErwinBrandstetter Thanks for the tips. I added the table definition and removed the nonsense month constraint. Per your second question, I said `Assuming I can handle finding overlapping times in a single day`, so you shouldn't have to worry about that. I added a bit more information to hopefully make the objective clearer. – Jaden Baptista Jun 03 '20 at 22:28
  • Months are not necessarily nonsense, they just exhibit "special" behavior. And if the time component needs to match, you might just say so, no point in cutting that out. Might matter for the best query. – Erwin Brandstetter Jun 03 '20 at 22:32
  • `recurring can be null` - that would be non-recurring apointments? Also, initial appointment cannot match the search term? – Erwin Brandstetter Jun 03 '20 at 22:34
  • @ErwinBrandstetter The time component does need to overlap somehow, but I thought I'd get better answers if I focused on the general problem, not the specific details of the problem in my case. I didn't want to get closed as "too specific". Also, yes, those would be non-recurring appointments. The query would probably need to include something like `WHERE recurring IS NOT NULL`, and I can put my time-overlapping logic in that WHERE clause – Jaden Baptista Jun 03 '20 at 22:40
  • Non-recurring appointments can still overlap given the same initial timestamp. – Erwin Brandstetter Jun 03 '20 at 22:41
  • @EdwinBrandstetter No initial appointments overlap. – Jaden Baptista Jun 03 '20 at 22:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215263/discussion-between-erwin-brandstetter-and-jaden-baptista). – Erwin Brandstetter Jun 03 '20 at 22:43

1 Answers1

2

This was hard.

WITH RECURSIVE moving_target(initial_timestamp, recurring) AS (
   VALUES (timestamp '2020-06-07', interval '12 days')  -- search term
   )
,  x AS (         -- advance to the closest day before or at moving target
   SELECT t.id
        , t_date + ((m_date - t_date) / t_step) * t_step AS t_date
        , t_step
        , m.*
   FROM  (        -- normalize table data
      SELECT id
           , initial_timestamp::date AS t_date
           , EXTRACT ('days' FROM recurring)::int AS t_step
      FROM   tbl
      WHERE  recurring IS NOT NULL  -- exclude!
      ) t
   CROSS  JOIN (  -- normalize input
      SELECT initial_timestamp::date AS m_date
           , EXTRACT ('days' FROM recurring)::int AS m_step
      FROM   moving_target
      ) m
   )
, rcte AS (       -- recursive CTE
   SELECT id, t_date, t_step, m_date, m_step
        , ARRAY[m_date - t_date] AS gaps        -- keep track of gaps
        , CASE
            WHEN t_date = m_date     THEN true  -- found match
            WHEN t_step % m_step = 0 THEN false -- can never match
            WHEN (m_date - t_date) % 2 = 1      -- odd gap ...
             AND t_step % 2 = 0                 -- ... but even steps
             AND m_step % 2 = 0      THEN false -- can never match
         -- WHEN <stop conditions?>  THEN false -- hard to determine!
         -- ELSE                          null  -- keep searching
          END AS match
   FROM   x

   UNION ALL
   SELECT id, t_date, t_step, m_date, m_step
        , gaps || m_date - t_date
        , CASE
            WHEN t_date = m_date                THEN true
            WHEN (m_date - t_date) = ANY (gaps) THEN false  -- gap repeated!
         -- ELSE                                     null   -- keep searching
          END AS match
   FROM  (
      SELECT id
           , t_date + (((m_date + m_step) - t_date) / t_step) * t_step AS t_date
           , t_step
           , m_date + m_step AS m_date -- + 1 step
           , m_step
           , gaps
      FROM   rcte
      WHERE  match IS NULL
      ) sub
   )
SELECT id, t.initial_timestamp, t.recurring
     , CASE WHEN r.match THEN r.t_date END AS match_date
FROM   rcte r
JOIN   tbl  t USING (id)
WHERE  r.match IS NOT NULL;

db<>fiddle here - with more test rows

There may be potential to improve further. The core problem is in the realm of
prime factorization. As it seems reasonable to expect fairly small intervals, I solved it by testing for cycles: If, while incrementally stepping forward, a gap between dates is detected that we have seen before, and dates didn't overlap yet, they will never overlap and we can stop. This loops at most GREATEST(m_step, t_step) times (the number of days in the bigger interval), so it shouldn't scale terribly.

I identified some basic mathematical stop conditions to avoid looping in hopeless cases a priori. There may be more ...

Explaining everything that's going on here is more work than devising the query. I added comments that should explain basics ...

Then again, while intervals are small, a "brute force" approach based on generate_series() may still be faster.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Wow Erwin! This is a lot. I'm not so great as Postgres, so dissecting this is blowing my mind a bit. I have been working on an implementation in PHP, but I'm going to try to use a modified version of your solution to keep all the logic in SQL. Thank you very much! – Jaden Baptista Jun 04 '20 at 01:34
  • @JadenBaptista: I had a bug (switched `t_date` & `t_days` in calculation of the gap). Fixed and renamed `t_days` to `t_step` to better tell them apart. – Erwin Brandstetter Jun 04 '20 at 14:24