0

I have a large dataset of ID of patients with delays in days between the surgery and radiotherapy (RT) sessions. Some patients may had two or three RT treatments. To identidy those patients, I consider a delay being greater than 91 days (3 months). This delay of 91 days corresponds to the end of one RT treatment and the start of another one. For analysis purposes it may be set at 61 days (2 months). How to make correspond this delay above 91 days between two values to a new RT treatement and add a corresponding order into a new column? My database looks like this:

df1 <- data.frame (
id = c("a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b", "c","c","c","c"), 
delay = c(2,3,5,6, 3,5,7,9, 190,195,201,203,205, 1299,1303,1306,1307, 200,202,204,205))

        > df1
   id delay
1   a     2
2   a     3
3   a     5
4   a     6
5   b     3
6   b     5
7   b     7
8   b     9
9   b   190
10  b   195
11  b   201
12  b   203
13  b   205
14  b  1299
15  b  1303
16  b  1306
17  b  1307
18  c   200
19  c   202
20  c   204
21  c   205

I failed to produce something like this considering if the time between the first set of delays is greater than 100 days.

df2 <- data.frame (
id = c("a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b", "c","c","c","c"), 
delay = c(2,3,5,6, 3,5,7,9, 190,195,201,203,205, 1299,1303,1306,1307, 200,202,204,205),
tt_order = c("1st","1st","1st","1st"," 1st","1st","1st","1st"," 2nd","2nd","2nd","2nd","2nd"," 3rd","3rd","3rd","3rd"," 1st","1st","1st","1st"))

        > df2
   id delay tt_order
1   a     2      1st
2   a     3      1st
3   a     5      1st
4   a     6      1st
5   b     3      1st
6   b     5      1st
7   b     7      1st
8   b     9      1st
9   b   190      2nd
10  b   195      2nd
11  b   201      2nd
12  b   203      2nd
13  b   205      2nd
14  b  1299      3rd
15  b  1303      3rd
16  b  1306      3rd
17  b  1307      3rd
18  c   200      1st
19  c   202      1st
20  c   204      1st
21  c   205      1st

I will be grateful for any help you can provide.

CharlesLDN
  • 169
  • 1
  • 9

2 Answers2

0

One way would be to divide delay by 100 and then use match and unique to get unique index in a sequential fashion for each id.

library(dplyr)
df2 %>%
  group_by(id) %>%
  mutate(n_tt = floor(delay/100), 
         n_tt = match(n_tt, unique(n_tt)))

#   id    delay tt_order  n_tt
#   <chr> <dbl>    <dbl> <int>
# 1 a         2        1     1
# 2 a         3        1     1
# 3 a         5        1     1
# 4 a         6        1     1
# 5 b         3        1     1
# 6 b         5        1     1
# 7 b         7        1     1
# 8 b         9        1     1
# 9 b       150        2     2
#10 b       152        2     2
#11 b       155        2     2
#12 b       159        2     2
#13 b      1301        3     3
#14 b      1303        3     3
#15 b      1306        3     3
#16 b      1307        3     3
#17 c       200        1     1
#18 c       202        1     1
#19 c       204        1     1
#20 c       205        1     1

Created a new column n_tt for comparison purposes with tt_order in df2.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

@CharlesLDN - perhaps this might be what you are looking for. This will look at differences in delay within each id, and gaps of > 90 days will be considered a new treatment.

library(tidyverse)

df1 %>%
  group_by(id) %>%
  mutate(tt_order = cumsum(c(0, diff(delay)) > 90) + 1)

Output

   id    delay tt_order
   <chr> <dbl>    <dbl>
 1 a         2        1
 2 a         3        1
 3 a         5        1
 4 a         6        1
 5 b         3        1
 6 b         5        1
 7 b         7        1
 8 b         9        1
 9 b       190        2
10 b       195        2
11 b       201        2
12 b       203        2
13 b       205        2
14 b      1299        3
15 b      1303        3
16 b      1306        3
17 b      1307        3
18 c       200        1
19 c       202        1
20 c       204        1
21 c       205        1
Ben
  • 28,684
  • 5
  • 23
  • 45