-2

enter image description hereI have a table I'm using in SQL Server with Patient CSN/BP Reading and the DateTime it was filed.

It would also start over at 1, when a new patient row starts.

Patient CSN BP Reading DateTime GroupID
614 162/92 8/22/2022 12:50 PM 1
614 161/102 8/22/2022 4:05 PM 1
614 160/97 8/22/2022 5:00 PM 2
614 163/05 8/22/2022 5:50 PM 1
297 161/02 8/27/2022 1:00 PM 1
297 168/06 8/27/2022 1:37 PM 2
297 168/99 8/27/2022 7:01 PM 1
297 171/94 8/27/2022 7:16 PM 2
297 166/92 8/27/2022 7:38 PM 3
297 160/98 8/27/2022 8:05 PM 1
297 169/99 8/27/2022 8:22 PM 2
297 161/90 8/27/2022 10:01 PM 1
297 160/98 8/27/2022 10:33 PM 2
297 161/99 8/27/2022 11:31 PM 1
297 161/97 8/27/2022 11:52 PM 2
297 164/96 8/28/2022 4:54 AM 1
297 160/103 8/31/2022 1:00 PM 1

LAG/LEAD but no success because it jumps 1 row.

I need to add a GroupID column based on times between rows (if they are not more than 60 minutes apart). The problem is it's a little tricky because it's not always just 1 row below to check. It could be 1,2, 5, or 10+. So the first patient (Patient CSN 614), their first entry is at 12:50pm so that would obviously be GroupID 1. I then need SQL to go down and essentially search all rows below until it reaches the 60-minute end DateTime of 1:50 PM, and every row that is <= 1:50 PM would get a sequential GroupID 2,3, etc. It would then start over at GroupID = 1 on anything past 1:51 PM (in this case it was at 4:05 PM). The new end time would now be 60 minutes past 4:05 PM. Since the next row is <= 5:05 PM, it would get GroupID 2.

  • Does this answer your question? [Assign Unique Group ID based on the time an event happened (within 60 minutes of each other)](https://stackoverflow.com/questions/76036759/assign-unique-group-id-based-on-the-time-an-event-happened-within-60-minutes-of) – HABO Apr 21 '23 at 01:20
  • No it doesn’t . – ThisGuyLA13 Apr 21 '23 at 01:52

1 Answers1

0

When asking a question like this it's important to provide easily reproducible DDL/DML. This does not include pictures of data as it cannot be copied easily, and does not describe the data types that are in use. Consider:

DECLARE @Table TABLE (CSN INT, BloodPressure NVARCHAR(7), ReadingDateTime DATETIME);
INSERT INTO @Table (CSN, BloodPressure, ReadingDateTime) VALUES

(614, '147/87', '2022-8-22 13:35:00'), (614, '150/90', '2022-8-22 13:50:00'), (614, '149/89', '2022-8-22 14:05:00'),
(614, '148/88', '2022-8-22 16:20:00'), (297, '140/80', '2022-8-22 12:50:00'), (297, '139/79', '2022-8-22 13:05:00'),
(297, '138/78', '2022-8-22 13:20:00'), (297, '137/77', '2022-8-22 13:35:00'), (297, '140/80', '2022-8-22 13:50:00'),
(614, '148/88', '2022-8-22 14:20:00'), (614, '150/90', '2022-8-22 15:50:00'), (614, '149/89', '2022-8-22 16:05:00'),
(297, '139/79', '2022-8-22 14:05:00'), (297, '138/78', '2022-8-22 14:20:00'), (297, '140/80', '2022-8-22 15:50:00'),
(614, '150/90', '2022-8-22 12:50:00'), (614, '149/89', '2022-8-22 13:05:00'), (614, '148/88', '2022-8-22 13:20:00'),
(297, '139/79', '2022-8-22 16:05:00'), (297, '138/78', '2022-8-22 16:20:00');

Using this example data we can create a rCTE (Recursive Common Table Expression) and use that to get to your desired result set:

;WITH Base AS (
SELECT t.CSN, t.BloodPressure, t.ReadingDateTime, ROW_NUMBER() OVER (PARTITION BY t.CSN ORDER BY ReadingDateTime) AS Rn, 
       CASE WHEN LAG(t.ReadingDateTime,1) OVER (PARTITION BY t.CSN ORDER BY t.ReadingDateTime) IS NULL OR 
                 LAG(t.ReadingDateTime,1) OVER (PARTITION BY t.CSN ORDER BY t.ReadingDateTime) < DATEADD(SECOND,-(59*60)+59,t.ReadingDateTime) THEN 1 
            ELSE 0 
        END AS IsNewGroup
  FROM @Table t
), rCTE AS (
SELECT Base.CSN, Base.BloodPressure, Base.ReadingDateTime, Base.Rn, 1 AS GroupSeqeunce, ROW_NUMBER() OVER (PARTITION BY CSN ORDER BY Rn) AS GroupNumber
  FROM Base
 WHERE IsNewGroup =  1
UNION ALL
SELECT a.CSN, r.BloodPressure, r.ReadingDateTime, r.Rn, a.GroupSeqeunce + 1, a.GroupNumber
  FROM rCTE a
    INNER JOIN Base r
      ON a.CSN = r.CSN
      AND a.Rn = r.Rn - 1
      AND r.IsNewGroup = 0
)

SELECT CSN, BloodPressure, ReadingDateTime, GroupNumber, GroupSeqeunce
  FROM rCTE
 ORDER BY CSN, GroupNumber, GroupSeqeunce
CSN BloodPressure ReadingDateTime GroupNumber GroupSeqeunce
297 140/80 2022-08-22 12:50:00.000 1 1
297 139/79 2022-08-22 13:05:00.000 1 2
297 138/78 2022-08-22 13:20:00.000 1 3
297 137/77 2022-08-22 13:35:00.000 1 4
297 140/80 2022-08-22 13:50:00.000 1 5
297 139/79 2022-08-22 14:05:00.000 1 6
297 138/78 2022-08-22 14:20:00.000 1 7
297 140/80 2022-08-22 15:50:00.000 2 1
297 139/79 2022-08-22 16:05:00.000 2 2
297 138/78 2022-08-22 16:20:00.000 2 3
614 150/90 2022-08-22 12:50:00.000 1 1
614 149/89 2022-08-22 13:05:00.000 1 2
614 148/88 2022-08-22 13:20:00.000 1 3
614 147/87 2022-08-22 13:35:00.000 1 4
614 150/90 2022-08-22 13:50:00.000 1 5
614 149/89 2022-08-22 14:05:00.000 1 6
614 148/88 2022-08-22 14:20:00.000 1 7
614 150/90 2022-08-22 15:50:00.000 2 1
614 149/89 2022-08-22 16:05:00.000 2 2
614 148/88 2022-08-22 16:20:00.000 2 3

First we're doing some testing on the rows to find the 'Starter' rows. These are rows which are 60 minutes or more away from the previous row or do not have a previous row for the same CSN. We're using LAG with the OVER clause to PARTITION BY the CSN and ORDER BY the ReadingDateTime.

Once we know where to start, we can use an rCTE to iterate over the rows and form groups, and sequence them. The original 'order' of the rows doesn't matter when they're sorted by the windowed functions.

This approach iterates over each row in order to compare the reading time to the current group start time:

;WITH Base AS (
SELECT a.CSN, a.BloodPressure, a.ReadingDateTime, ROW_NUMBER() OVER (PARTITION BY a.CSN ORDER BY a.ReadingDateTime) AS Rn
  FROM @Table a
), rCTE AS (
SELECT a.CSN, a.BloodPressure, a.ReadingDateTime, a.Rn, a.ReadingDateTime AS GroupStartDateTime
  FROM Base a
 WHERE Rn = 1
UNION ALL
SELECT a.CSN, r.BloodPressure, r.ReadingDateTime, r.Rn, CASE WHEN r.ReadingDateTime < DATEADD(SECOND,3599,a.GroupStartDateTime) THEN a.GroupStartDateTime ELSE r.ReadingDateTime END AS GroupStartDateTime
  FROM rCTE a
    INNER JOIN Base r
      ON a.CSN = r.CSN
      AND a.Rn + 1 = r.Rn
)

SELECT t.CSN, t.BloodPressure, t.ReadingDateTime, 
       DENSE_RANK() OVER (PARTITION BY t.CSN ORDER BY t.GroupStartDateTime) AS GroupNumber, 
       ROW_NUMBER() OVER (PARTITION BY t.CSN, t.GroupStartDateTime ORDER BY t.ReadingDateTime) AS GroupSequence
  FROM rCTE t
  ORDER BY t.CSN, t.ReadingDateTime
CSN BloodPressure ReadingDateTime GroupNumber GroupSequence
297 140/80 2022-08-22 12:50:00.000 1 1
297 139/79 2022-08-22 13:05:00.000 1 2
297 138/78 2022-08-22 13:20:00.000 1 3
297 137/77 2022-08-22 13:35:00.000 1 4
297 140/80 2022-08-22 13:50:00.000 2 1
297 139/79 2022-08-22 14:05:00.000 2 2
297 138/78 2022-08-22 14:20:00.000 2 3
297 140/80 2022-08-22 15:50:00.000 3 1
297 139/79 2022-08-22 16:05:00.000 3 2
297 138/78 2022-08-22 16:20:00.000 3 3
614 150/90 2022-08-22 12:50:00.000 1 1
614 149/89 2022-08-22 13:05:00.000 1 2
614 148/88 2022-08-22 13:20:00.000 1 3
614 147/87 2022-08-22 13:35:00.000 1 4
614 150/90 2022-08-22 13:50:00.000 2 1
614 149/89 2022-08-22 14:05:00.000 2 2
614 148/88 2022-08-22 14:20:00.000 2 3
614 150/90 2022-08-22 15:50:00.000 3 1
614 149/89 2022-08-22 16:05:00.000 3 2
614 148/88 2022-08-22 16:20:00.000 3 3
Patrick Hurst
  • 2,086
  • 1
  • 3
  • 13
  • Thank you, this is very close. When looking at my example above for patient 614, the group sequence (for times 12:50, 16:05, 17:00, 17:50) should be 1, 1, 2, 1. The output on the CTE gives me 1, 1, 2, 3. Any thoughts? – ThisGuyLA13 Apr 21 '23 at 13:09
  • And looking at the table that you made, for patient 297, the 7th entry at 14:20 is when the grouping should start over since it's >1 hour past 12:50. At that point, 14:20 would be #1 all the way up to 15:20, and anything past 15:20 would start over at 1. The "IsnewGroup" column seems to be just using LAG for 1 previous row, but it's not always 1 row (by looking at the examples above). Make sense? – ThisGuyLA13 Apr 21 '23 at 15:07
  • @ThisGuyLA13 I can't comment on the results for data which hasn't be provided. If you'd like to provide it, please do and we can go from there. `2022-08-22 14:20` is included in the first group as it occurs within 60 minutes of the previous `2022-08-22 14:05`. The next one for that patient is more than 60 minutes after that (`2022-08-22 15:50`), so it starts a new group. – Patrick Hurst Apr 21 '23 at 15:09
  • Hey thanks Patrick. I've updated my table with the expected output for GroupID. The important part is that it's not necessarily always 1 row before when calculating the 60 minutes gap. For instance, look at my result set entry #4, it was filed at 5:50 pm but it gets a new GoupID of 1. That's because it's not looking to the previous row when calculating, it's looking to the previous GroupID 1 (which is 4:05) and adding 60 minutes to that (5:05). Since it's after 5:05, it starts over at 1. – ThisGuyLA13 Apr 21 '23 at 18:51
  • The "previous" row refers to the previous value within the group. The `OVER` clause takes care of this for us. – Patrick Hurst Apr 22 '23 at 19:04
  • You can see this behavior demonstrated between group 1, sequence 7 and group 2, sequence 1. – Patrick Hurst Apr 22 '23 at 19:08
  • Ok did you see my updated table? The query does not output the correct groups though as expected, ie - GroupID? – ThisGuyLA13 Apr 22 '23 at 19:13
  • In your example for Patient 297 the group ID should be (DateTime - GroupID): 12:50 - 1, 13:05 - 2, 13:20 - 3, 13:35 - 4, 13:50 - 5, 14:05 - 1 (the GroupID now starts over at 1 since this is greater than 13:50), 14:20 - 2, 15:50 - 1 (the GroupID now starts over at 1 since this is greater than 15:05), 16:05 - 2, 16:20 - 3 – ThisGuyLA13 Apr 22 '23 at 21:12
  • I added another method to the answer which I think meets this new requirement. – Patrick Hurst May 04 '23 at 15:33