SQL How to group a sequence of elements that share a chronological order

Question

I have a set of records that consists of the start and stop dates, as the following:

ID	started	stop
1	2017-08-14	2017-10-22
2	2017-10-23	2017-12-12
3	2019-01-28	2019-02-21->
4	Some-Date	NULL
5	2020-09-08	2020-09-14
6	2020-09-15	2020-10-14
7	->2019-02-22	2019-03-18

I need to merge those sequence of dates that come in a chronological order as heighlighted which follow the rule (stop = start + one day).

The result should look like this:

ID	started	stop
1	2017-08-14	2019-03-18
2	Some-Date	NULL
3	2020-09-08	2020-10-14

My approach: creates 3 new colums and check each entry x with its previous entry (x - 1), and with its next one (x + 1), and if any of those is in between -1 , +1 then group them:

    update date_table as t
    set group_num_before = (select date_difference
                            from (SELECT g1.id, DATEDIFF(g1.start_dt, g2.stop_dt) AS date_difference
                                  FROM date_table g1
                                           INNER JOIN
                                       date_table g2 ON g2.id = g1.id - 1) as groupNum
                            where t.id = groupNum.id),
        group_num_after  = (select date_difference
                            from (SELECT g1.id, DATEDIFF(g1.stop_dt, g2.start_dt) AS date_difference
                                  FROM date_table g1
                                           INNER JOIN
                                       date_table g2 ON g2.id = g1.id + 1) as groupNum
                            where t.id = groupNum.id)
    where true;

    update date_table as g1
        INNER JOIN
        date_table g2 ON g2.id = g1.id
    set g1.group_num = IF(g1.group_num_before in (-1, 1) OR
                          g1.group_num_after in (-1, 1), -1, g1.group_num_before)
    where g1.A = g2.B;

This result in:

ID	started	stop	groupNum
1	2017-08-14	2017-10-22	-1
2	2017-10-23	2017-12-12	-1
3	2017-12-13	2019-02-21->	-1
4	Some-Date	NULL	null
5	2020-09-08	2020-09-14	-1
6	2020-09-15	2020-10-14	-1
7	->2019-02-22	2019-03-18	-1

However, doing this put all the sequenced records in the same group since they also come in a chronological order, and this lead to incorrect result when grouping them together.

Any idea will be appreciated Thanks in advance.

Your sample data row 3 starts 2019-01-28 which makes your outcome impossible. Is that a typo given it later appears as 2017-12-13 ? — P.Salmon, Nov 23 '22 at 11:53
Does each separate column is defined as unique? if not then your table may contain duplicated valuies and the task may have a lot of different solutions. — Akina, Nov 23 '22 at 12:34
@P.Salmon The middle date was irrelevant and was a mistake, I edited it. Thanks — Rody, Nov 23 '22 at 12:42

Thorsten Kettner · Answer 1 · 2022-11-23T17:09:50.627

2

Use LAG to see the previous stop and flag all gaps. Then count the gaps to get the group numbers.

select started, stop, count(flag) over (order by started) as grp
from
(
  select 
    started, stop,
    case when lag(stop) over (order by started) = started - interval 1 day then null else 1 end as flag
  from date_table
) with_flags
order by started;

If you want to get one result row per group number, aggregate above result:

select
  grp,
  min(started) as started,
  case when count(*) = count(stop) then max(stop) end as stop
from
(
  select started, stop, count(flag) over (order by started) as grp
  from
  (
    select 
      started, stop,
      case when lag(stop) over (order by started) = started - interval 1 day then null else 1 end as flag
    from date_table
  ) with_flags
) grouped
group by grp
order by grp;

Demo: https://dbfiddle.uk/jKlHZ09P

edited Nov 23 '22 at 17:09

answered Nov 23 '22 at 16:54

Thorsten Kettner

89,309
7
49
73

What a ingenious way to solve the problem, Thanks. However this approach has couple of problems: 1) if two dates shares same start date but different stop dates, one of them will be removed. 2) merging fails if two similar record exists or at least two records with the same start date (not sure why). Demo: (https://dbfiddle.uk/EGjLWS_U) – Rody Nov 24 '22 at 13:05
But what would two rows with the same start date even mean? If I have `2022-01-01 to 2022-01-02` and then both `2022-01-03 to 2022-01-04` and `2022-01-03 to 2022-01-05`, which of the two latter would prolong the first? As I see this, one of the two rows starting with `2022-01-03` is superfluous and shoudn't even exist. If there are such duplicates and they do have a meaning, then you will have to make your mind up, which rules you want to apply to deal with this and you'll end up with quite an altered task probably. – Thorsten Kettner Nov 24 '22 at 16:19
Thanks for the responce. Those dates simulate the action of taking a drug, there are many records with the same starting date but each with a specific stop date. And yes, you are right about the example, it's a bit ambiguous, however, there are other columns that differentiate these records from each other, e.g. Dose etc. The example that i have shown, is the oppsite of your example. `2019-10-10 to 2020-09-07 2019-10-10 to 2019-12-08 2020-09-08 to 2022-10-20` regadless of the starting date, if there a chronological order as the example it should merge. Thanks in advance. – Rody Nov 24 '22 at 17:11
Well, merging is exactly what my query does. You seem to want the opposite. For readability I use letters instead of dates. You have duplicate starts like A->B and A->C. And with an additional B->D you seem to want to generate two groups A->D and A->C. But if there can be duplicate starts, you could also have an additional B->E. Now there would be two candidates to link to A->B. My query merges the data ending with one A->E covering the whole range. You seem to want to generate two separate groups instead A->D and A->E, where the first part (A->B) would suddenly part of two groups. – Thorsten Kettner Nov 25 '22 at 06:21
But then you would no longer assign each row a single group, but would have to traverse trees. For this you would need an iterative process, which is done with recursve queries in SQL. You should then also clearly define rules, when to join adjacent days. You say there are other columns that would come into play. So the task becomes another one altogether. I suggest you write a completely new request for this, elaborating what rules to apply and showing an example with several chain candidates (same start dates, maybe even same end dates, if these can also exist) and the expected result. – Thorsten Kettner Nov 25 '22 at 06:26
Thanks for the answer. Yes, I want to generate two groups as you mentioned, but I can understand your point, I may not have considered all possible scenarios, e.g. B->E or if there is another line. Thanks for the clarification. I have to check my data and think about possible scenarios and the rules. I will definitely suggest you (if I may) as possible candidates who can answer the question when posting. Thanks again – Rody Nov 25 '22 at 11:59

SQL How to group a sequence of elements that share a chronological order

1 Answers1