1

The software that I am using gives the summary output of the survfit function. What is the easiest way to take this information and use the ggsurvplot function? I understand that this summary data is in a different format from the traditional data frame for the ggsurvplot function. Is there another function I should be using instead for a Kaplan-Meier Curve? Any information would be much appreciated. Notably, the survival probabilities round to 1 in the summary output, so it would be great if I could use the n.risk and n.event columns to calculate more accurate survivals. Thanks!

Screenshot below:enter image description here

structure(list(time = c(11L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 
20L, 21L), n.risk = c(399490L, 399133L, 398853L, 398558L, 398078L, 
397755L, 397487L, 397273L, 397108L, 396949L), n.event = c(1L, 
1L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 6L), survival = c(1, 1, 1, 1, 
1, 1, 1, 1, 1, 1), std.err = c(2.5e-06, 3.54e-06, 5.6e-06, 6.63e-06, 
7.52e-06, 7.93e-06, 8.69e-06, 9.73e-06, 1.04e-05, 1.21e-05), 
    lowerci = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), upperci = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -10L), class = "data.frame")
````
davidk
  • 133
  • 2
  • 11

1 Answers1

1

Not sure if you can recreate the survplot without the original data, e.g. using the built-in lung dataset:

library(survival)
library(survminer)
#> Loading required package: ggplot2
#> Loading required package: ggpubr
#> 
#> Attaching package: 'survminer'
#> The following object is masked from 'package:survival':
#> 
#>     myeloma

fit <- survfit(Surv(time, status) ~ sex, data = lung)

# Create a 'summary object'
sum_fit <- summary(fit)

df1 <- data.frame(time=fit$time,
                  nRisk=fit$n.risk,
                  nRiskRel=fit$n.risk/max(fit$n.risk))  

df2 <- data.frame(time_sum=sum_fit$time,
                  nRisk_sum=sum_fit$n.risk,
                  nRiskRel_sum=sum_fit$n.risk/max(sum_fit$n.risk))

ggplot1 <- ggsurvplot(fit, data = lung)$plot
ggplot1 +
  geom_point(aes(x=time, y=nRiskRel), data = df1, alpha=0.5, size=3) +
  geom_point(aes(x=time_sum, y=nRiskRel_sum), data = df2, alpha=0.5, size=3, color="blue")

nrow(df1)
#> [1] 206
nrow(df2)
#> [1] 150

Created on 2021-10-13 by the reprex package (v2.0.1)

There are fewer points in the summary object ('sum_fit') than the original data ('fit'). I think this may be a problem if you want to accurately recreate a survival curve plot. There are also differences between the 'fit' and 'sum_fit' list objects that you would need to correct to use the ggsurvplot function. I would be very interested to see if someone has a clever solution to this problem.

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46