0

Is there a way to assign unique Group ID to rows in SQL based off on how much time is between each one (in this case 60 minutes). For patient 614 I would like to start the first row at 12:50pm and that would be Group ID 1 and since the next BP is past 1:50pm (60 minutes), it would be Group ID 1 since its at 4:05pm. But the next Blood Pressure is at 5:00pm (within 60 minutes of 4:05pm, so it would be Group ID 2.

For patient 297 I would need to it to start at the first row (8/27 at 1pm since this is the first Blood Pressure) and this would be row # 1 and then start going through each row (to see if a row is <= 60 min from 1pm), if it is it would be #2, then #3, etc. But once its over 60 min, I then need it to start over again at the next #1 grouping (which is anything over 60 minutes or the next patient CSN), and use the same logic, going through each row to see if <= 60 min. I've included a screen shot of the expected Group ID I would like to see for each patient.

The grouping would also need to start over at a new patient, since patients have unique ID (CSN).

Desired Outcome:

enter image description here

1 Answers1

0

It sounds like this is a part of the classic "Overlapping Date Ranges" that you can google if you'd like more details. Another similar problem that you'll find a lot of literature on is the "Gaps and Islands Problem."

Basically, you want to do these steps:

  1. add a new column + 60 minutes, or whatever your buffer is, to get an "end time"
  2. Lead and lag so you can calculate the difference to the "next" one, as well as calculate if an island "started" or not

I have a very generic template that I run for this problem, it probably includes a lot of stuff you don't need. Also it works in Snowflake flavor, so you probably will have to adjust it for your specific RDBMS. But hopefully, this helps you:

WITH CTE_CONDITION AS (
  SELECT 
    BP_DATETIME AS dtm 
  FROM 
    ExampleTable 
  WHERE 
    FIRST_BP_READING IS NOT NULL 
    AND BP_DATETIME is not null
), 
CTE_LAGGED AS (
  SELECT 
    dtm, 
    LAG(dtm) OVER (
      ORDER BY 
        dtm
    ) AS previous_datetime, 
    LEAD(dtm) OVER (
      ORDER BY 
        dtm
    ) AS next_datetime, 
    ROW_NUMBER() OVER (
      ORDER BY 
        CTE_CONDITION.dtm
    ) AS island_location 
  FROM 
    CTE_CONDITION
), 
CTE_ISLAND_START AS (
  SELECT 
    ROW_NUMBER() OVER (
      ORDER BY 
        dtm
    ) AS island_number, 
    dtm AS island_start_datetime, 
    island_location AS island_start_location 
  FROM 
    CTE_LAGGED 
  WHERE 
    (
      DATEDIFF(HOUR, previous_datetime, dtm) > 60 
      OR CTE_LAGGED.previous_datetime IS NULL
    )
), 
CTE_ISLAND_END AS (
  SELECT 
    ROW_NUMBER() OVER (
      ORDER BY 
        dtm
    ) AS island_number, 
    dtm AS island_end_datetime, 
    island_location AS island_end_location 
  FROM 
    CTE_LAGGED 
  WHERE 
    DATEDIFF(HOUR, dtm, next_datetime) > 60 
    OR CTE_LAGGED.next_datetime IS NULL
) 
SELECT 
  CTE_ISLAND_START.island_start_datetime, 
  CTE_ISLAND_END.island_end_datetime, 
  DATEDIFF(
    HOUR, CTE_ISLAND_START.island_start_datetime, 
    CTE_ISLAND_END.island_end_datetime
  ) AS ISLAND_DURATION_HOUR, 
  (
    SELECT 
      COUNT(*) 
    FROM 
      CTE_LAGGED 
    WHERE 
      CTE_LAGGED.dtm BETWEEN CTE_ISLAND_START.island_start_datetime 
      AND CTE_ISLAND_END.island_end_datetime
  ) AS island_row_count 
FROM 
  CTE_ISLAND_START 
  INNER JOIN CTE_ISLAND_END ON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number
Josh
  • 1,493
  • 1
  • 13
  • 24
  • This looks like it is summarizing. Do you know if there is a way to provide a unique Group ID based on the desired outcome above? – ThisGuyLA13 Apr 19 '23 at 17:58