Extracting Kaplan-Meier step function

Question

What I am trying to do is to fit 5 Kaplan Meier curves on 5 imputed datasets from MICE. What I aim to do is at every time point, take the average of the 5 survival probabilities. I think this would be easy if I had the exact form of the step function that makes up each of the KM curves, but I don't know how to extract that.

Here is an example of the code that I would run

#make up data
 survival_time=rexp(10,3)
 dead=sample(c(0,1),10,replace=TRUE)
 gender=sample(c(0,1),10,replace=TRUE)

 #induce missingness in gender
 gender[3:5]=NA
 data=cbind(survival_time,dead,gender)

 #do imputation
 imp=mice(data)

 #fit KM curves on each of the imputed datasets
 km_fit=with(imp,survfit(Surv(survival_time,dead)~gender))

 #now break down each km curve into male and female
 #and average the surv prob at each time
 #but how?

The challenge is that the survival time and death indicator are always fixed, but the amount in each gender changes between imputation. Because of this, the number in each group, and thus the number and time of events changes between imputation.

What my plan would be, supposing I could get the step functions would be to use apply predict on the step functions to get the means. Is this the best solution, or do you think there would be a better one?

I suppose that I could manually code up a Kaplan Meier estimate, but I would much rather use a built-in R function if it exists. — RayVelcoro, Aug 04 '15 at 15:43
Actually, might be easier to use the broom package. `library(broom)` and then `tidy(km_fit)`. — Benjamin, Aug 04 '15 at 15:47
@Benjamin The summary function is helpfulish, but not exactly. From the summary, I can extract the survival probabilities from each curve, but they are not on the same timescale. For example curve 1: time .13 , .14, .19, .21, .57, surv: .8, .6,.6,.3,0 Curve 2 time: .19, .57, surv: 1,0 So, in an ideal world, we would be able to take the mean at time .10: (1+1)/2=1 .13: (.8+1)/2=.9 .14 (.6+1)/2=.8 But we don't have this information handy (they have different times). — RayVelcoro, Aug 04 '15 at 15:49
You could probably make use of the `times` argument in summary to tell `summary` at what times to produce the estimates. — Benjamin, Aug 04 '15 at 15:52
I believe this solves my problem. Now I have a followup. What I want to do now is have a "master time" vector, and make a matrix with this vector, and cbind it to the 5 columns of KM estimates. I will then apply and make a 7th column with the average of the 5 KM estimates. However, when using the command summary(km_fit$analyses[[1]][1],time=master_list)$surv The lengths may be different, between men and women (depending on last event type). If I wanted to write one code for both, would it be appropriate to fill in the ones out of range with the last value? — RayVelcoro, Aug 04 '15 at 16:30
I don't think it is uncommon to carry the last known value out, but I've come to feel that doing so violates the reality that everybody dies. I am of the preference that those values should be filled as missing. In terms of comparing across groups, this is appropriate since we shouldn't be making comparisons at time points for which we have not observed data. — Benjamin, Aug 04 '15 at 21:12

score 1 · Accepted Answer · answered Aug 04 '15 at 21:13

1

To record the answer in the comments, this was resolved by using summary (km_fit) with the times argument.

answered Aug 04 '15 at 21:13

Benjamin

16,897
6
45
65

Extracting Kaplan-Meier step function

1 Answers1