1

I have a dataset as below:

Patient_ID  Time_Start Time_End X1 X2 X3 Status
001           0          1                 0
001           1          2                 0
001           2          3                 0
001           3          4                 0
002           0          1                 0
002           1          2                 0
002           2          3                 0
002           3          4                 1

In which X3 is a time-dependent variable.

I built a cox regression model as below:

model.cox=coxph(Surv(Time_Start, Time_End, Status)~X1+X2+X3+CLUSTER(ID), data=mydata)

After getting the model, I use predictSurvProb() from the library "pec" to predict the survival probability of each patient at every time point:

predicted.surv.prob=predictSurvProb(model.cox, newdata=mydataset, times=seq(1:4))

However, the function returned a dataframe as below in which each record has its own survival probabilities between month 1 and 4:

Patient_ID  Time_Start Time_End Month1 Month2 Month3 Month4
001           0          1        0.99  0.98    0.97   0.96      
001           1          2       0.985  0.976  0.968   0.965
001           2          3        .......................         
001           3          4        .........................          
002           0          1        ........................         
002           1          2        ........................
002           2          3        ..........................
002           3          4                 

Obviously, the result did not make sense. Patient 001 had four groups of predicted probability and each one is different with others.

How can I add something to let predictSurvProb() know that all the records with same ID should be grouped together and only return one group of prediction?

soniCYouth
  • 33
  • 1
  • 8

0 Answers0