3

This problem has confounded me for more hours than I care to admit. I have isolated the problem so I can replicate it.

library(survival)
library(survminer)

set.seed(123)
test <- data.frame(rnorm(10000)+5,
                   sample(0:1, 10000, replace = TRUE))

colnames(test)<- c("time", "event")
#sum(test$event) = 4975
survfitted <- survfit(Surv(time = time, event = event) ~ 1,
                      data = test)
plot(survfitted, fun = "event")

Why does this curve sum up to 100% when only 49.75% experience an event? What would be the right syntax for producing a plot showing the cumulative incidence proportion?

The problem appears to be that the censoring is treated as an event.

Jakn09ab
  • 179
  • 9
  • this question is probably more appropriate for a site called cross validated. – Mike Mar 19 '20 at 13:22
  • This is some odd "survival" data. Usually KM curves show a decrease of the survival probability with time. Your data shows the opposite. That aside and more generally, Kaplan-Meier estimates show the change in *cumulative* probability (usually of survival) with time. KM estimates denote a cumulative probability and are therefore bounded by 0 and 1. – Maurits Evers Mar 19 '20 at 13:25
  • If the censoring events all occur before the last event, then the the last event will take the KM-curve to 0, or as in this case will take the Hazard curve to 1. – IRTFM Mar 25 '20 at 20:25

1 Answers1

0

If the censoring events all occur before the last event, then the the last event will take the KM-curve to 0, or as in this case will take the Hazard curve to 1.0. (The plot is a KM estimate of Hazard rather than of Survival.)

Your simulation distributed the events and censoring extremely evenly, so almost any such plot will show the Hazard function approaching very close to 1. If you chose your seed as 9 , you get a plot where it does not quite reach zero.

set.seed(9)
png(); plot(survfitted, fun = "event"); abline(h=1);dev.off()

enter image description here

The Hazard plot will always get close to 1 if the events and censoring times are distributed evenly across the same range. The reason that most medical examples of survival or hazard plots terminate in the middle of hte 0-1 range is that typically there are many censoring times out beyond the last observed event.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • My actual data show the same issue. I think my encoding may be the issue. I want to create a plot of the cum proportion that initiates a treatment, some are censored (death, studyend) . My dataset are structured as: Time column: numeric includes the number of days since enrolled and until drug intiation, death or end of follow-up Event column: numeric containing 0s (did not experience event within the follow-up) and 1s (did experience an event within follow-up. – Jakn09ab Mar 26 '20 at 15:01