0

referencing the following manual for time dependent survival in R:

https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf

From the vignette example:

subject time1 time2 death creatinine
 5       0     90    0     0.9
 5       90    120   0     1.5
 5       120   185   1     1.2

The data I have is in the following format:

In dd-mm-yyyy format

subject date           death creatinine
 5       01-01-2022     0     0.9
 5       01-04-2022     0     1.5
 5       01-05-2022     0     1.2
 5        05-07-2022     1     1.2

I need to format the data below to match to the data above.

Mark
  • 639
  • 1
  • 6
  • 15

1 Answers1

1

You can't fill in time2 in the last row without more information. In single-event data, if an individual has the event (like in your example), time2 value in the final row would typically be the time of the event (in the final row). For those that don't have the event, time2 might be the time the observation for that individual ended.

So, excluding the final time2 value per subject, you can do something like this

library(dplyr)

df %>% 
  # change date to Date using as.Date()
  mutate(date=as.Date(date,"%d-%m-%y")) %>% 
  # arrange the rows by date
  arrange(date) %>% 
  # group by subject
  group_by(subject) %>% 
  # for each subject, create time2 and time1
  mutate(
    time2 = as.numeric(lead(date)-min(date)-1),
    time1 = lag(time2), 
    time1 = if_else(row_number()==1, 0, time1)
  ) %>% 
  ungroup() %>% 
  # move time1 next to time2
  relocate(time1,.before = time2)

Output:

  subject date       death creatinine time1 time2
    <int> <date>     <int>      <dbl> <dbl> <dbl>
1       5 2020-01-01     0        0.9     0    90
2       5 2020-04-01     0        1.5    90   120
3       5 2020-05-01     1        1.2   120    NA
langtang
  • 22,248
  • 1
  • 12
  • 27