Predict event probability per factor with a coxph model

Question

My question is a relatively simple one but I couldn't find any clear answer from the different forums. I am running a coxph model to predict the survival of individual plants that experienced two treatments in three different sites. The individuals were monitored for three years. My data and the associated model looks like this:

# Generate data
mydata <- data.frame(Site = as.factor(sample(c("SiteA", "SiteB", "SiteC"), 100, replace = TRUE)), 
                     Treatment = as.factor(sample(c("Treat.A", "Treat.B"), 100, replace = TRUE)), 
                     Time = sample(c(1, 2, 3), 100, replace = TRUE), 
                     Surv = sample(c(0, 1), 100, replace = TRUE)) # Alive is 0, death is 1


# Model
mymodel <- coxph(Surv(Time , Surv) ~ Treatment*Site, 
              data = mydata)

What I want is the probability of death after 3 years for each site and each treatment (and the associated confidence interval). Is it possible to extract this information ?

Based on the different forums that explored similar questions, my guess would have been to add three columns to my dataset using the command :

mydata$fit <- survfit(mymodel, newdata=mydata)$surv
mydata$lower <- survfit(mymodel, newdata=mydata)$lower
mydata$upper<- survfit(mymodel, newdata=mydata)$upper

And from this only keep the lines that I am interested in. However, this doesn't work and the command generates a vector with 3 times more elements than the original dataset (in this example, 300 instead of 100). Is there something that I misunderstood ?

ulfelder · Answer 1 · 2020-01-28T19:34:20.433

I think you're having this problem because the surv, lower, and upper elements of the object returned by survfit are not vectors, they are matrices. It gives you survival curves, not point predictions. The columns in those matrices are associated with the specific combinations of covariates appearing in the rows of the data frame you fed into survfit, while the rows of those matrices represent the full range of (sequential) time steps observed in your original data. If you want the fitted values for a specific time, t, you need to pull the tth row of that matrix, i.e., fitted$surv[t,].

To solve your specific problem, one option is to make a new data frame with only the combinations of covariates you want, then apply your model to it, then extract the row(s) representing the time step(s) you want. So, here...

library(survival)

# Generate data
set.seed(123)
mydata <- data.frame(Site = as.factor(sample(c("SiteA", "SiteB", "SiteC"), 100, replace = TRUE)), 
                     Treatment = as.factor(sample(c("Treat.A", "Treat.B"), 100, replace = TRUE)), 
                     Time = sample(seq(3), 100, replace = TRUE), 
                     Surv = sample(c(0, 1), 100, replace = TRUE)) # Alive is 0, death is 1


# Model
mymodel <- coxph(Surv(Time , Surv) ~ Treatment*Site, data = mydata)

# use expand.grid to get a table with all possible combinations of Site and Treatment
newdata <- with(mydata, expand.grid(Site = unique(Site), Treatment = unique(Treatment)))
# add a vector for your time of interest for clarity's sake; it won't actually factor into survfit
newdata$time = 3

# run survfit on that new table
fitted <- survfit(mymodel, newdata = newdata)

# extract the fitted values for the time slice of interest to you, here 3
newdata$fit <- fitted$surv[3,]
newdata$lower <- fitted$lower[3,]
newdata$upper <- fitted$upper[3,]

# result
print(newdata)
   Site Treatment time       fit      lower     upper
1 SiteA   Treat.B    3 0.3149307 0.15064889 0.6583612
2 SiteC   Treat.B    3 0.1721691 0.04597197 0.6447887
3 SiteB   Treat.B    3 0.3979556 0.18679672 0.8478130
4 SiteA   Treat.A    3 0.6117692 0.37752270 0.9913616
5 SiteC   Treat.A    3 0.3390650 0.15646255 0.7347769
6 SiteB   Treat.A    3 0.3128776 0.13297313 0.7361819

score 1 · Answer 2 · answered Jan 29 '20 at 00:46

Use predict.coxph with a time value

testset <-data.frame( Time=3, Surv=1,  # the Surv value is just a placeholder
                      Treatment=factor(rep(c("Treat.A", "Treat.B"),times=3)) , 
                      Site=factor(rep(c("SiteA", "SiteB", "SiteC"), each=2)))

testset$Surv3yr <- exp( -predict(mymodel, newdata=testset, typ="expected") )
testset
  Time Surv Treatment  Site   Surv3yr
1    3    1   Treat.A SiteA 0.1633725
2    3    1   Treat.B SiteA 0.3906895
3    3    1   Treat.A SiteB 0.3432062
4    3    1   Treat.B SiteB 0.2940677
5    3    1   Treat.A SiteC 0.5411742
6    3    1   Treat.B SiteC 0.2047518

Predict event probability per factor with a coxph model

2 Answers2