Assign Group ID based on time between rows that are dynamically separated (not just 1 row apart)

Question

I have a table I'm using in SQL Server with Patient CSN/BP Reading and the DateTime it was filed.

It would also start over at 1, when a new patient row starts.

Patient CSN	BP Reading	DateTime	GroupID
614	162/92	8/22/2022 12:50 PM	1
614	161/102	8/22/2022 4:05 PM	1
614	160/97	8/22/2022 5:00 PM	2
614	163/05	8/22/2022 5:50 PM	1
297	161/02	8/27/2022 1:00 PM	1
297	168/06	8/27/2022 1:37 PM	2
297	168/99	8/27/2022 7:01 PM	1
297	171/94	8/27/2022 7:16 PM	2
297	166/92	8/27/2022 7:38 PM	3
297	160/98	8/27/2022 8:05 PM	1
297	169/99	8/27/2022 8:22 PM	2
297	161/90	8/27/2022 10:01 PM	1
297	160/98	8/27/2022 10:33 PM	2
297	161/99	8/27/2022 11:31 PM	1
297	161/97	8/27/2022 11:52 PM	2
297	164/96	8/28/2022 4:54 AM	1
297	160/103	8/31/2022 1:00 PM	1

LAG/LEAD but no success because it jumps 1 row.

I need to add a GroupID column based on times between rows (if they are not more than 60 minutes apart). The problem is it's a little tricky because it's not always just 1 row below to check. It could be 1,2, 5, or 10+. So the first patient (Patient CSN 614), their first entry is at 12:50pm so that would obviously be GroupID 1. I then need SQL to go down and essentially search all rows below until it reaches the 60-minute end DateTime of 1:50 PM, and every row that is <= 1:50 PM would get a sequential GroupID 2,3, etc. It would then start over at GroupID = 1 on anything past 1:51 PM (in this case it was at 4:05 PM). The new end time would now be 60 minutes past 4:05 PM. Since the next row is <= 5:05 PM, it would get GroupID 2.

Does this answer your question? [Assign Unique Group ID based on the time an event happened (within 60 minutes of each other)](https://stackoverflow.com/questions/76036759/assign-unique-group-id-based-on-the-time-an-event-happened-within-60-minutes-of) — HABO, Apr 21 '23 at 01:20

Patrick Hurst · Answer 1 · 2023-05-04T15:32:46.263

When asking a question like this it's important to provide easily reproducible DDL/DML. This does not include pictures of data as it cannot be copied easily, and does not describe the data types that are in use. Consider:

DECLARE @Table TABLE (CSN INT, BloodPressure NVARCHAR(7), ReadingDateTime DATETIME);
INSERT INTO @Table (CSN, BloodPressure, ReadingDateTime) VALUES

(614, '147/87', '2022-8-22 13:35:00'), (614, '150/90', '2022-8-22 13:50:00'), (614, '149/89', '2022-8-22 14:05:00'),
(614, '148/88', '2022-8-22 16:20:00'), (297, '140/80', '2022-8-22 12:50:00'), (297, '139/79', '2022-8-22 13:05:00'),
(297, '138/78', '2022-8-22 13:20:00'), (297, '137/77', '2022-8-22 13:35:00'), (297, '140/80', '2022-8-22 13:50:00'),
(614, '148/88', '2022-8-22 14:20:00'), (614, '150/90', '2022-8-22 15:50:00'), (614, '149/89', '2022-8-22 16:05:00'),
(297, '139/79', '2022-8-22 14:05:00'), (297, '138/78', '2022-8-22 14:20:00'), (297, '140/80', '2022-8-22 15:50:00'),
(614, '150/90', '2022-8-22 12:50:00'), (614, '149/89', '2022-8-22 13:05:00'), (614, '148/88', '2022-8-22 13:20:00'),
(297, '139/79', '2022-8-22 16:05:00'), (297, '138/78', '2022-8-22 16:20:00');

Using this example data we can create a rCTE (Recursive Common Table Expression) and use that to get to your desired result set:

;WITH Base AS (
SELECT t.CSN, t.BloodPressure, t.ReadingDateTime, ROW_NUMBER() OVER (PARTITION BY t.CSN ORDER BY ReadingDateTime) AS Rn, 
       CASE WHEN LAG(t.ReadingDateTime,1) OVER (PARTITION BY t.CSN ORDER BY t.ReadingDateTime) IS NULL OR 
                 LAG(t.ReadingDateTime,1) OVER (PARTITION BY t.CSN ORDER BY t.ReadingDateTime) < DATEADD(SECOND,-(59*60)+59,t.ReadingDateTime) THEN 1 
            ELSE 0 
        END AS IsNewGroup
  FROM @Table t
), rCTE AS (
SELECT Base.CSN, Base.BloodPressure, Base.ReadingDateTime, Base.Rn, 1 AS GroupSeqeunce, ROW_NUMBER() OVER (PARTITION BY CSN ORDER BY Rn) AS GroupNumber
  FROM Base
 WHERE IsNewGroup =  1
UNION ALL
SELECT a.CSN, r.BloodPressure, r.ReadingDateTime, r.Rn, a.GroupSeqeunce + 1, a.GroupNumber
  FROM rCTE a
    INNER JOIN Base r
      ON a.CSN = r.CSN
      AND a.Rn = r.Rn - 1
      AND r.IsNewGroup = 0
)

SELECT CSN, BloodPressure, ReadingDateTime, GroupNumber, GroupSeqeunce
  FROM rCTE
 ORDER BY CSN, GroupNumber, GroupSeqeunce

CSN	BloodPressure	ReadingDateTime	GroupNumber	GroupSeqeunce
297	140/80	2022-08-22 12:50:00.000	1	1
297	139/79	2022-08-22 13:05:00.000	1	2
297	138/78	2022-08-22 13:20:00.000	1	3
297	137/77	2022-08-22 13:35:00.000	1	4
297	140/80	2022-08-22 13:50:00.000	1	5
297	139/79	2022-08-22 14:05:00.000	1	6
297	138/78	2022-08-22 14:20:00.000	1	7
297	140/80	2022-08-22 15:50:00.000	2	1
297	139/79	2022-08-22 16:05:00.000	2	2
297	138/78	2022-08-22 16:20:00.000	2	3
614	150/90	2022-08-22 12:50:00.000	1	1
614	149/89	2022-08-22 13:05:00.000	1	2
614	148/88	2022-08-22 13:20:00.000	1	3
614	147/87	2022-08-22 13:35:00.000	1	4
614	150/90	2022-08-22 13:50:00.000	1	5
614	149/89	2022-08-22 14:05:00.000	1	6
614	148/88	2022-08-22 14:20:00.000	1	7
614	150/90	2022-08-22 15:50:00.000	2	1
614	149/89	2022-08-22 16:05:00.000	2	2
614	148/88	2022-08-22 16:20:00.000	2	3

First we're doing some testing on the rows to find the 'Starter' rows. These are rows which are 60 minutes or more away from the previous row or do not have a previous row for the same CSN. We're using LAG with the OVER clause to PARTITION BY the CSN and ORDER BY the ReadingDateTime.

Once we know where to start, we can use an rCTE to iterate over the rows and form groups, and sequence them. The original 'order' of the rows doesn't matter when they're sorted by the windowed functions.

This approach iterates over each row in order to compare the reading time to the current group start time:

;WITH Base AS (
SELECT a.CSN, a.BloodPressure, a.ReadingDateTime, ROW_NUMBER() OVER (PARTITION BY a.CSN ORDER BY a.ReadingDateTime) AS Rn
  FROM @Table a
), rCTE AS (
SELECT a.CSN, a.BloodPressure, a.ReadingDateTime, a.Rn, a.ReadingDateTime AS GroupStartDateTime
  FROM Base a
 WHERE Rn = 1
UNION ALL
SELECT a.CSN, r.BloodPressure, r.ReadingDateTime, r.Rn, CASE WHEN r.ReadingDateTime < DATEADD(SECOND,3599,a.GroupStartDateTime) THEN a.GroupStartDateTime ELSE r.ReadingDateTime END AS GroupStartDateTime
  FROM rCTE a
    INNER JOIN Base r
      ON a.CSN = r.CSN
      AND a.Rn + 1 = r.Rn
)

SELECT t.CSN, t.BloodPressure, t.ReadingDateTime, 
       DENSE_RANK() OVER (PARTITION BY t.CSN ORDER BY t.GroupStartDateTime) AS GroupNumber, 
       ROW_NUMBER() OVER (PARTITION BY t.CSN, t.GroupStartDateTime ORDER BY t.ReadingDateTime) AS GroupSequence
  FROM rCTE t
  ORDER BY t.CSN, t.ReadingDateTime

CSN	BloodPressure	ReadingDateTime	GroupNumber	GroupSequence
297	140/80	2022-08-22 12:50:00.000	1	1
297	139/79	2022-08-22 13:05:00.000	1	2
297	138/78	2022-08-22 13:20:00.000	1	3
297	137/77	2022-08-22 13:35:00.000	1	4
297	140/80	2022-08-22 13:50:00.000	2	1
297	139/79	2022-08-22 14:05:00.000	2	2
297	138/78	2022-08-22 14:20:00.000	2	3
297	140/80	2022-08-22 15:50:00.000	3	1
297	139/79	2022-08-22 16:05:00.000	3	2
297	138/78	2022-08-22 16:20:00.000	3	3
614	150/90	2022-08-22 12:50:00.000	1	1
614	149/89	2022-08-22 13:05:00.000	1	2
614	148/88	2022-08-22 13:20:00.000	1	3
614	147/87	2022-08-22 13:35:00.000	1	4
614	150/90	2022-08-22 13:50:00.000	2	1
614	149/89	2022-08-22 14:05:00.000	2	2
614	148/88	2022-08-22 14:20:00.000	2	3
614	150/90	2022-08-22 15:50:00.000	3	1
614	149/89	2022-08-22 16:05:00.000	3	2
614	148/88	2022-08-22 16:20:00.000	3	3

Thank you, this is very close. When looking at my example above for patient 614, the group sequence (for times 12:50, 16:05, 17:00, 17:50) should be 1, 1, 2, 1. The output on the CTE gives me 1, 1, 2, 3. Any thoughts? — ThisGuyLA13, Apr 21 '23 at 13:09
And looking at the table that you made, for patient 297, the 7th entry at 14:20 is when the grouping should start over since it's >1 hour past 12:50. At that point, 14:20 would be #1 all the way up to 15:20, and anything past 15:20 would start over at 1. The "IsnewGroup" column seems to be just using LAG for 1 previous row, but it's not always 1 row (by looking at the examples above). Make sense? — ThisGuyLA13, Apr 21 '23 at 15:07
@ThisGuyLA13 I can't comment on the results for data which hasn't be provided. If you'd like to provide it, please do and we can go from there. `2022-08-22 14:20` is included in the first group as it occurs within 60 minutes of the previous `2022-08-22 14:05`. The next one for that patient is more than 60 minutes after that (`2022-08-22 15:50`), so it starts a new group. — Patrick Hurst, Apr 21 '23 at 15:09
Hey thanks Patrick. I've updated my table with the expected output for GroupID. The important part is that it's not necessarily always 1 row before when calculating the 60 minutes gap. For instance, look at my result set entry #4, it was filed at 5:50 pm but it gets a new GoupID of 1. That's because it's not looking to the previous row when calculating, it's looking to the previous GroupID 1 (which is 4:05) and adding 60 minutes to that (5:05). Since it's after 5:05, it starts over at 1. — ThisGuyLA13, Apr 21 '23 at 18:51
The "previous" row refers to the previous value within the group. The `OVER` clause takes care of this for us. — Patrick Hurst, Apr 22 '23 at 19:04
You can see this behavior demonstrated between group 1, sequence 7 and group 2, sequence 1. — Patrick Hurst, Apr 22 '23 at 19:08
Ok did you see my updated table? The query does not output the correct groups though as expected, ie - GroupID? — ThisGuyLA13, Apr 22 '23 at 19:13
In your example for Patient 297 the group ID should be (DateTime - GroupID): 12:50 - 1, 13:05 - 2, 13:20 - 3, 13:35 - 4, 13:50 - 5, 14:05 - 1 (the GroupID now starts over at 1 since this is greater than 13:50), 14:20 - 2, 15:50 - 1 (the GroupID now starts over at 1 since this is greater than 15:05), 16:05 - 2, 16:20 - 3 — ThisGuyLA13, Apr 22 '23 at 21:12
I added another method to the answer which I think meets this new requirement. — Patrick Hurst, May 04 '23 at 15:33

Assign Group ID based on time between rows that are dynamically separated (not just 1 row apart)

1 Answers1