I have a dataset as below:
Patient_ID Time_Start Time_End X1 X2 X3 Status
001 0 1 0
001 1 2 0
001 2 3 0
001 3 4 0
002 0 1 0
002 1 2 0
002 2 3 0
002 3 4 1
In which X3 is a time-dependent variable.
I built a cox regression model as below:
model.cox=coxph(Surv(Time_Start, Time_End, Status)~X1+X2+X3+CLUSTER(ID), data=mydata)
After getting the model, I use predictSurvProb() from the library "pec" to predict the survival probability of each patient at every time point:
predicted.surv.prob=predictSurvProb(model.cox, newdata=mydataset, times=seq(1:4))
However, the function returned a dataframe as below in which each record has its own survival probabilities between month 1 and 4:
Patient_ID Time_Start Time_End Month1 Month2 Month3 Month4
001 0 1 0.99 0.98 0.97 0.96
001 1 2 0.985 0.976 0.968 0.965
001 2 3 .......................
001 3 4 .........................
002 0 1 ........................
002 1 2 ........................
002 2 3 ..........................
002 3 4
Obviously, the result did not make sense. Patient 001 had four groups of predicted probability and each one is different with others.
How can I add something to let predictSurvProb() know that all the records with same ID should be grouped together and only return one group of prediction?