My question is a relatively simple one but I couldn't find any clear answer from the different forums. I am running a coxph model to predict the survival of individual plants that experienced two treatments in three different sites. The individuals were monitored for three years. My data and the associated model looks like this:
# Generate data
mydata <- data.frame(Site = as.factor(sample(c("SiteA", "SiteB", "SiteC"), 100, replace = TRUE)),
Treatment = as.factor(sample(c("Treat.A", "Treat.B"), 100, replace = TRUE)),
Time = sample(c(1, 2, 3), 100, replace = TRUE),
Surv = sample(c(0, 1), 100, replace = TRUE)) # Alive is 0, death is 1
# Model
mymodel <- coxph(Surv(Time , Surv) ~ Treatment*Site,
data = mydata)
What I want is the probability of death after 3 years for each site and each treatment (and the associated confidence interval). Is it possible to extract this information ?
Based on the different forums that explored similar questions, my guess would have been to add three columns to my dataset using the command :
mydata$fit <- survfit(mymodel, newdata=mydata)$surv
mydata$lower <- survfit(mymodel, newdata=mydata)$lower
mydata$upper<- survfit(mymodel, newdata=mydata)$upper
And from this only keep the lines that I am interested in. However, this doesn't work and the command generates a vector with 3 times more elements than the original dataset (in this example, 300 instead of 100). Is there something that I misunderstood ?