I have data which is generated by intermittent interviews in which an individual is asked whether they are experiencing a certain symptom. The last time each individual was known to not have this particular symptom, is denoted as tstart
. If applicable, the time at which the individual is observed to have developed the symptom is tstop
. Using the survival
package in R, a survival object is created with the Surv
function, specifying that this is interval censored data. I would like a non-parametric maximum likelihood estimate of the survival function. This can be accomplished using the survfit
function, which seems to pass the call to an internal function survfitTurnbull
. The resulting confidence intervals are implausibly wide. I am unable to figure out why this is the case.
# A random sample of the data using dput()
structure(list(tstart = c(0.01, 38, 0.01, 0.01, 23, 26, 0.01,
19, 0.01, 0.01, 22, 6, 0.01, 14, 16, 0.01, 0.01, 0.01, 0.01,
21, 15, 0.01, 0.01, 13, 10, 0.01, 0.01, 19, 0.01, 0.01, 0.01,
0.01, 22, 17, 27, 14, 16, 0.01, 20, 27, 10, 0.01, 0.01, 16, 20,
7, 6, 15, 0.01, 0.01), tstop = c(4.01, NA, 5.01, 8.01, NA, NA,
5.01, NA, 3.01, 16.01, NA, 6.01, 8.01, NA, NA, 7.01, 16.01, 1.01,
10.01, NA, NA, 5.01, 8.01, NA, NA, 2.01, 3.01, NA, 7.01, 5.01,
2.01, 9.01, NA, NA, NA, NA, NA, 10.01, NA, NA, NA, 5.01, 10.01,
NA, NA, NA, 7.01, NA, 14.01, 4.01)), row.names = c(NA, -50L), class = "data.frame")
survObj <- with(temp_df, Surv(time = tstart, time2 = tstop, type = "interval2"))
survFit <- survfit(SurvObj ~ 1))
summary(survFit)
The confidence interval does not narrow over time. It is no narrower using the whole dataset (which is contains approximately 10 times the number of events). I am unable to figure out what is going wrong.