0

I'm attempting to conduct survival analysis with time-varying covariates. The data comes from a longitudinal survey that is administered annually, and I have manipulated it to look like this:

id  event       end.time    income1      income2    income3     income4
1   1           3           8            10         13          8       
2   0           4           13           15         24          35

event indicates whether the event occurred or not, end.time is the time to event, and I have my time-varying covariates for each subsequent period to the right. So, for observation 1, the event occurred at year 3, and during year 1, they earned an income of 8 thousand dollars, etc. For observation 2, the event is censored, and we have data up to year 4 (when the study ends).

In the end, I'd like my data to look something like this:

id  st.time end.time    event   inc

1   0       1           0       8
1   1       2           0       10
1   2       3           1       13
2   0       1           0       13
2   1       2           0       15
2   2       3           0       24
2   3       4           0       35

I've looked up the tmerge() and SurvSplit() functions but am unsure of how to apply them in this specific situation. It seems that with SurvSplit(), I could use the cutpoints by year, but not sure how it would reshape the time-varying covariates.

It might be the case that using a generic reshape might work better?

Any advice would be appreciated.

Ryan
  • 77
  • 7
  • How do you get values for st.time ,end.time, events , censor columns? – Ronak Shah Jun 10 '20 at 01:58
  • The start time for every observation is 0, and goes up in increments of 1 (for each year). The income1, income2, is associated with the measurements in those follow up years. The end.time should be the event2yr. If the event didn't happen, then that observation is censored. In actuality, the censor column might not need to exist to run a Cox regression. I'll take that out. – Ryan Jun 10 '20 at 02:01
  • As a follow up, I've aligned the column names to make the two data frames consistent. – Ryan Jun 10 '20 at 02:08
  • `id = 2` should have more rows, right? – Ronak Shah Jun 10 '20 at 02:09
  • Yes. I truncated the data, but yes it should. I will edit it again. – Ryan Jun 10 '20 at 02:10

1 Answers1

1

Probably a general reshape along with some manipulation with dplyr would work.

library(dplyr)

df %>%
  tidyr::pivot_longer(cols = starts_with('income'), values_to = 'inc') %>%
  group_by(id) %>%
  slice(1:first(end.time)) %>%
  mutate(end.time = row_number(),
         st.time = end.time - 1,
         event = replace(event, -n(), 0)) %>%
  select(-name)


#     id event end.time   inc st.time
#  <int> <dbl>    <dbl> <int>   <dbl>
#1     1     0        1     8       0
#2     1     0        2    10       1
#3     1     1        3    13       2
#4     2     0        1    13       0
#5     2     0        2    15       1
#6     2     0        3    24       2
#7     2     0        4    35       3

data

df <- structure(list(id = 1:2, event = 1:0, end.time = 3:4, income1 = c(8L, 
13L), income2 = c(10L, 15L), income3 = c(13L, 24L), income4 = c(8L, 
35L)), class = "data.frame", row.names = c(NA, -2L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • This definitely seems to be the correct answer. I need to explore pivot_longer further, as I actually have more variables than income. All of the variables are fashioned in the same way, for instance income_t1, educ_t1. I keep wrangling the data, but it only seems to pull some of the information. – Ryan Jun 11 '20 at 01:13