Using the survival
package in R, we can use the "heart" dataset:
survfit(Surv(stop, event) ~ transplant, data = heart)
This outputs a model has n=172 (103 in the transplant=1 group; and 69 in the transplant=1 group) and 75 events (30 in treatment=0; 45 in treatment=1).
And if we plot the K-M curve with survminer
package:
ggsurvplot(survfit(Surv(stop, event) ~ transplant, data = heart), risk.table = "nrisk_cumcensor", xlim=c(0,5*365), break.x.by = 365, conf.int=TRUE)
It shows that there are 103 and 69 individuals at risk to start with in each transplant group.
However, there are only 103 individuals in total (length(unique(heart$id))
), not 172.
Trying to force the id with either "id" or "cluster" (eg survfit(Surv(stop, event) ~ transplant, id=id, cluster=id, data = heart)
) doesn't change the result.
How can we make the model understand there are multiple lines for each individual?