-2

The given dataset contains a timestamp in the third column which consists of a date in mm/dd/yyyy format with time in 24 hr format for the month of January. I wish to find the difference in minutes using R by comparing every row with its previous row only if patient has a common value i.e. difference of every timestamp row with previous row if patient value of both is same say "1","2" and "3". This also means that first row of the dataset should give value 0 minutes as there is nothing to compare. Thanks and please help.

patient  handling                time
1        Registration            1/2/2017 11:41
1        Triage and Assessment   1/2/2017 12:40
1        Registration            1/2/2017 12:40
1        Triage and Assessment   1/2/2017 22:32
1        Blood test              1/5/2017 8:59
1        Blood test              1/5/2017 14:34
1        MRI SCAN                1/5/2017 21:37
2        X-Ray                   1/7/2017  4:31
2        X-Ray                   1/7/2017  7:57
2        Discuss Results         1/7/2017 14:45
2        Discuss Results         1/7/2017 17:55
2        Check-out               1/9/2017 17:09
2        Check-out               1/9/2017 19:14
3        Registration            1/4/2017  1:34
3        Registration            1/4/2017  6:36
3        Triage and Assessment   1/4/2017 17:49
3        Triage and Assessment   1/5/2017 8:59
3        Blood test              1/5/2017 21:37
3        Blood test              1/6/2017 3:53
Sotos
  • 51,121
  • 6
  • 32
  • 66
Ashmin Kaul
  • 860
  • 2
  • 12
  • 37

2 Answers2

4

If time is already of class POSIXct, and the data frame is already sorted by patient and time, the time difference in minutes can be appended using a streamlined version of SBista's answer

library(dplyr)
DF %>% 
  group_by(patient) %>% 
  mutate(delta = difftime(time, lag(time, default = first(time)), units = "mins")) 
 # A tibble: 19 x 4
 # Groups:   patient [3]
   patient              handling                time     delta
     <chr>                 <chr>              <dttm>    <time>
 1       1          Registration 2017-01-02 11:41:00    0 mins
 2       1 Triage and Assessment 2017-01-02 12:40:00   59 mins
 3       1          Registration 2017-01-02 12:40:00    0 mins
 4       1 Triage and Assessment 2017-01-02 22:32:00  592 mins
 5       1            Blood test 2017-01-05 08:59:00 3507 mins
 6       1            Blood test 2017-01-05 14:34:00  335 mins
 7       1              MRI SCAN 2017-01-05 21:37:00  423 mins
 8       2                 X-Ray 2017-01-07 04:31:00    0 mins
 9       2                 X-Ray 2017-01-07 07:57:00  206 mins
10       2       Discuss Results 2017-01-07 14:45:00  408 mins
11       2       Discuss Results 2017-01-07 17:55:00  190 mins
12       2             Check-out 2017-01-09 17:09:00 2834 mins
13       2             Check-out 2017-01-09 19:14:00  125 mins
14       3          Registration 2017-01-04 01:34:00    0 mins
15       3          Registration 2017-01-04 06:36:00  302 mins
16       3 Triage and Assessment 2017-01-04 17:49:00  673 mins
17       3 Triage and Assessment 2017-01-05 08:59:00  910 mins
18       3            Blood test 2017-01-05 21:37:00  758 mins
19       3            Blood test 2017-01-06 03:53:00  376 mins

Another approach would be to compute the delta for all rows ignoring the grouping by patient and then to replace the first value for each patient by zero as requested by the OP. Ignoring the groups in first place might bring a performance gain (not verified).

Unfortunately, I am not proficient enough to implement this using dplyr syntax, so I use data.table with its update by reference:

library(data.table)
setDT(DF)[, delta := difftime(time, shift(time), units = "mins")][]
DF[DF[, first(.I), by = patient]$V1, delta := 0][]
    patient              handling                time     delta
 1:       1          Registration 2017-01-02 11:41:00    0 mins
 2:       1 Triage and Assessment 2017-01-02 12:40:00   59 mins
 3:       1          Registration 2017-01-02 12:40:00    0 mins
 4:       1 Triage and Assessment 2017-01-02 22:32:00  592 mins
 5:       1            Blood test 2017-01-05 08:59:00 3507 mins
 6:       1            Blood test 2017-01-05 14:34:00  335 mins
 7:       1              MRI SCAN 2017-01-05 21:37:00  423 mins
 8:       2                 X-Ray 2017-01-07 04:31:00    0 mins
 9:       2                 X-Ray 2017-01-07 07:57:00  206 mins
10:       2       Discuss Results 2017-01-07 14:45:00  408 mins
11:       2       Discuss Results 2017-01-07 17:55:00  190 mins
12:       2             Check-out 2017-01-09 17:09:00 2834 mins
13:       2             Check-out 2017-01-09 19:14:00  125 mins
14:       3          Registration 2017-01-04 01:34:00    0 mins
15:       3          Registration 2017-01-04 06:36:00  302 mins
16:       3 Triage and Assessment 2017-01-04 17:49:00  673 mins
17:       3 Triage and Assessment 2017-01-05 08:59:00  910 mins
18:       3            Blood test 2017-01-05 21:37:00  758 mins
19:       3            Blood test 2017-01-06 03:53:00  376 mins
Uwe
  • 41,420
  • 11
  • 90
  • 134
2

You can do the following:

 data %>%
  group_by(patient) %>%
  mutate(diff_in_sec = as.POSIXct(time, format = "%m/%d/%Y %H:%M") - lag(as.POSIXct(time, format = "%m/%d/%Y %H:%M"), default=first(as.POSIXct(time, format = "%m/%d/%Y %H:%M"))))%>%
  mutate(diff_in_min = as.numeric(diff_in_sec/60))

You get the output as:

 # A tibble: 19 x 5
# Groups:   patient [3]
   patient              handling           time diff_in_sec diff_in_min
     <int>                 <chr>          <chr>      <time>       <dbl>
 1       1          Registration 1/2/2017 11:41      0 secs           0
 2       1 Triage and Assessment 1/2/2017 12:40   3540 secs          59
 3       1          Registration 1/2/2017 12:40      0 secs           0
 4       1 Triage and Assessment 1/2/2017 22:32  35520 secs         592
 5       1            Blood test  1/5/2017 8:59 210420 secs        3507
 6       1            Blood test 1/5/2017 14:34  20100 secs         335
 7       1              MRI SCAN 1/5/2017 21:37  25380 secs         423
 8       2                 X-Ray  1/7/2017 4:31      0 secs           0
 9       2                 X-Ray  1/7/2017 7:57  12360 secs         206
10       2       Discuss Results 1/7/2017 14:45  24480 secs         408
11       2       Discuss Results 1/7/2017 17:55  11400 secs         190
12       2             Check-out 1/9/2017 17:09 170040 secs        2834
13       2             Check-out 1/9/2017 19:14   7500 secs         125
14       3          Registration  1/4/2017 1:34      0 secs           0
15       3          Registration  1/4/2017 6:36  18120 secs         302
16       3 Triage and Assessment 1/4/2017 17:49  40380 secs         673
17       3 Triage and Assessment  1/5/2017 8:59  54600 secs         910
18       3            Blood test 1/5/2017 21:37  45480 secs         758
19       3            Blood test  1/6/2017 3:53  22560 secs         376
SBista
  • 7,479
  • 1
  • 27
  • 58