How to add a new column based on conditionnal difference between rows

Question

I have a large dataset of ID of patients with delays in days between the surgery and radiotherapy (RT) sessions. Some patients may had two or three RT treatments. To identidy those patients, I consider a delay being greater than 91 days (3 months). This delay of 91 days corresponds to the end of one RT treatment and the start of another one. For analysis purposes it may be set at 61 days (2 months). How to make correspond this delay above 91 days between two values to a new RT treatement and add a corresponding order into a new column? My database looks like this:

df1 <- data.frame (
id = c("a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b", "c","c","c","c"), 
delay = c(2,3,5,6, 3,5,7,9, 190,195,201,203,205, 1299,1303,1306,1307, 200,202,204,205))

        > df1
   id delay
1   a     2
2   a     3
3   a     5
4   a     6
5   b     3
6   b     5
7   b     7
8   b     9
9   b   190
10  b   195
11  b   201
12  b   203
13  b   205
14  b  1299
15  b  1303
16  b  1306
17  b  1307
18  c   200
19  c   202
20  c   204
21  c   205

I failed to produce something like this considering if the time between the first set of delays is greater than 100 days.

df2 <- data.frame (
id = c("a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b", "c","c","c","c"), 
delay = c(2,3,5,6, 3,5,7,9, 190,195,201,203,205, 1299,1303,1306,1307, 200,202,204,205),
tt_order = c("1st","1st","1st","1st"," 1st","1st","1st","1st"," 2nd","2nd","2nd","2nd","2nd"," 3rd","3rd","3rd","3rd"," 1st","1st","1st","1st"))

        > df2
   id delay tt_order
1   a     2      1st
2   a     3      1st
3   a     5      1st
4   a     6      1st
5   b     3      1st
6   b     5      1st
7   b     7      1st
8   b     9      1st
9   b   190      2nd
10  b   195      2nd
11  b   201      2nd
12  b   203      2nd
13  b   205      2nd
14  b  1299      3rd
15  b  1303      3rd
16  b  1306      3rd
17  b  1307      3rd
18  c   200      1st
19  c   202      1st
20  c   204      1st
21  c   205      1st

I will be grateful for any help you can provide.

Why the `id` remains 2 in row 11 even when the `delay` is 201? — Ronak Shah, Aug 28 '20 at 10:04
Delay are days after the surgery thus for an ID it can take any value. — CharlesLDN, Aug 28 '20 at 10:13
Yes but don't we want to increment it for every 100 value? So all the values after 200 till 300 should be 3? — Ronak Shah, Aug 28 '20 at 10:31
Not really. I have rephrase my question in oder to be more clear. — CharlesLDN, Aug 28 '20 at 10:52

Ronak Shah · Answer 1 · 2020-08-28T09:43:20.580

0

One way would be to divide delay by 100 and then use match and unique to get unique index in a sequential fashion for each id.

library(dplyr)
df2 %>%
  group_by(id) %>%
  mutate(n_tt = floor(delay/100), 
         n_tt = match(n_tt, unique(n_tt)))

#   id    delay tt_order  n_tt
#   <chr> <dbl>    <dbl> <int>
# 1 a         2        1     1
# 2 a         3        1     1
# 3 a         5        1     1
# 4 a         6        1     1
# 5 b         3        1     1
# 6 b         5        1     1
# 7 b         7        1     1
# 8 b         9        1     1
# 9 b       150        2     2
#10 b       152        2     2
#11 b       155        2     2
#12 b       159        2     2
#13 b      1301        3     3
#14 b      1303        3     3
#15 b      1306        3     3
#16 b      1307        3     3
#17 c       200        1     1
#18 c       202        1     1
#19 c       204        1     1
#20 c       205        1     1

Created a new column n_tt for comparison purposes with tt_order in df2.

edited Aug 28 '20 at 09:43

answered Aug 28 '20 at 09:04

Ronak Shah

377,200
20
156
213

Hello. Thank you for your quick answer. I tried your solution to my original dataset but it failed to work, producing inappropriate order e.g. for late delays (>1000 days) I have thus modified my question. – CharlesLDN Aug 28 '20 at 09:25
@CharlesLDN When does the order restart with 1? – Ronak Shah Aug 28 '20 at 09:38
When the ID changes (a, then b then c...) – CharlesLDN Aug 28 '20 at 09:40
I have further modified my question as the delays may spanned over different hundreds (295, 298, 300, 305). – CharlesLDN Aug 28 '20 at 09:57

score 0 · Answer 2 · answered Aug 28 '20 at 13:30

@CharlesLDN - perhaps this might be what you are looking for. This will look at differences in delay within each id, and gaps of > 90 days will be considered a new treatment.

library(tidyverse)

df1 %>%
  group_by(id) %>%
  mutate(tt_order = cumsum(c(0, diff(delay)) > 90) + 1)

Output

   id    delay tt_order
   <chr> <dbl>    <dbl>
 1 a         2        1
 2 a         3        1
 3 a         5        1
 4 a         6        1
 5 b         3        1
 6 b         5        1
 7 b         7        1
 8 b         9        1
 9 b       190        2
10 b       195        2
11 b       201        2
12 b       203        2
13 b       205        2
14 b      1299        3
15 b      1303        3
16 b      1306        3
17 b      1307        3
18 c       200        1
19 c       202        1
20 c       204        1
21 c       205        1

How to add a new column based on conditionnal difference between rows

2 Answers2