1

Here is a sample dataset below:

age = runif(200, min = 25, max=70)
profile_id = seq(1, 200)
gender = sample(c("M", "F"), size = 200, replace = T)
start_date = sample(seq(as.Date('2013/01/01'), as.Date('2014/01/01'), by="month"), 200, replace = T)
end_date = sample(seq(as.Date('2014/01/01'), as.Date('2016/01/01'), by="month"), 200, replace = T)

mydf = data.frame(profile_id, age, gender, start_date, end_date)

mydf$end_date[mydf$end_date > as.Date('2015/01/01')] = as.Date('2015/01/01')
mydf$death = ifelse(mydf$end_date < as.Date('2015/01/01'), 1, 0)
mydf$periods_alive = mydf$end_date - mydf$start_date

Basically, if possible, I am trying to utilize some kind of survival regression model to predict for those who are still alive at the end of the time period, their probabilities of survival for future time periods after the study. For example the probability of survival at each month for the next 12 months or something.

I understand I could do something like this below to estimate probabilities of survival for new observations during the sample period (although I'm not entirely sure how to extract the probabilities from the predict function):

m1 = survreg(Surv(periods_alive, death)~ age + gender, data = mydf)
mydf_alive = mydf[mydf$death == 0, ]
predict(m1, newdata = mydf_alive, type = 'quantile')

But I was curious if there was a way to predict those probabilites of survival at some future time T for the censored observations. I'm not really hung up on using survival analysis if there's a better way to model these probabilities but I thought there was possibly some way to do this? Any help on how to proceed would be greatly appreciated!

Imconfused
  • 31
  • 1
  • 5
  • If you can fit a parametric survival model (exponential, weibull) then you can get the predictions for free. – user2974951 Dec 14 '18 at 08:41
  • But would those predictions for those who haven't had the event yet, be the probability of the event in the time after the sample period? Or would it be the probability they would have experienced the event in the sample period? – Imconfused Dec 14 '18 at 16:21

1 Answers1

0

The predict.survreg function used with a type='quantile' has a default for the p (percentile) parameter of c(0.1,0.9). So you are getting a matrix of 2 predicted survival probabilities for each of your survivors. The "0.1" column is the predicted number of days until a predicted survival of 90%, while the 0.9 column is the predicted number of days until a predicted survival of 10%. (Each of the percentiles are complements of 100% when thinking in terms of the remaining number of survivors. They are actually predicted time until the specified cumulative Hazard is reached.) (You should read ?predict.survreg)

We are basically assuming the Markov property. Probabilities aren't changing. If someone is alive, then you are essentially resetting their survival to 100% and letting time go forward. I think you probably would want to adjust their ages to the current age values, if this were being done on the current set of survivors

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Ahh ok cool that makes sense, so I could enter a sequence across 0-1 for the p parameter and simply find the percentile closest to one month survival, two month, etc. Thanks again for your help – Imconfused Dec 14 '18 at 19:11
  • Should work. If trying to make a survival curve the x values would be the predicted and the y values would the inputs to `predict.survreg( ..., type="quantile")` – IRTFM Dec 14 '18 at 22:22