0

Given a data frame in R with different columns that could work as dependent variables, I'm trying to create a function that receives the data frame 'df', list or vector with dependent variables 'vars', a time variable 'time' and a status variable 'status' that returns both survival results using 'survfit' and a kaplan-meier curve using ggsurvplot.

The intention is avoiding too much copying and paste code.

Take the data below as an example:

library(ggplot2)
library(survival)
library("dplyr")

df <- lung %>%
  transmute(time,
            status,  # censoring status 1=censored, 2=dead
            Age = age,
            Sex = factor(sex, labels = c("Male", "Female")),
            ECOG = factor(lung$ph.ecog),
            `Meal Cal` = as.numeric(meal.cal))

# help(lung)

# Turn status into (0=censored, 1=dead)
df$status <- ifelse(df$status == 2, 1, 0)

I certainly can do survival analyses like this:

fit <- survfit(Surv(time, status) ~ ECOG, data = df)

ggsurvplot(fit,
           pval = TRUE, pval.coord = c(750, 0.3), 
           conf.int = FALSE, 
           surv.median.line = "hv", 
           legend = c(0.8, 0.6), 
           legend.title = "",
           risk.table = "absolute", 
           risk.table.y.text = FALSE,  
           xlab = "Time (days)", ylab = "Survival", 
           palette="jco",
           title="Overall Survival", font.title = c(16, "bold", "black"), 
)

However, I'd have to copy and paste everything again if I want to do the same with Sex. So I'd like to create a function in R that takes as inputs a data frame 'df', a list of dependent variables 'vars', a time variable 'time', and a status variable 'status' and returns both survival results using 'survfit' and a Kaplan-Meier curve using 'ggsurvplot', like the following:

vars <- c("ECOG", "Sex")

surv_plot_func <- function(df, vars, time, status) {
  results_list <- lapply(vars, function(var, time, status) {
    
    # Fit a survival model
    fit <- survfit(Surv(as.numeric(df[[time]]), as.logical(df[[status]])) ~ as.factor(df[[var]]), data = df)
    
    # Plot the Kaplan-Meier curve using ggsurvplot
    ggsurv <- ggsurvplot(fit, pval = TRUE, conf.int = TRUE,
                         risk.table = TRUE, legend.title = "",
                         surv.median.line = "hv", xlab = "Time", ylab = "Survival Probability")
    
    # Return the fit and ggsurv as a list
    list(fit = fit, ggsurv = ggsurv)
  })
  
  # Return the list of results
  results_list
}

res_list <- surv_plot_func(df, vars, "time", "status")

However, it didn't work. Any ideas?

Allan
  • 321
  • 1
  • 8

1 Answers1

0

The codes below works for me.
As I mentioned in comment, I found out the error is due to ggsurvplot(), and this functions couldn't read the form in the lapply().

So I tried to make form globally with <<-, and finally it worked.

library(survival)
library(survminer)
library(dplyr)

df <- lung %>%
  transmute(time,
            status,  # censoring status 1=censored, 2=dead
            Age = age,
            Sex = factor(sex, labels = c("Male", "Female")),
            ECOG = factor(lung$ph.ecog),
            `Meal Cal` = as.numeric(meal.cal))

vars <- c("ECOG", "Sex")

surv_plot_func <- function(df, vars, time, status) {

  results_list <- lapply(vars, \(x){
    # # Creating a formula as a string
    form <<- paste0("Surv(", time, ", ", status,") ~ ",x)
    fit <- survfit(as.formula(form), data=df)
    
    # # Plot the Kaplan-Meier curve using ggsurvplot
    ggsurv <- ggsurvplot(fit, pval = TRUE, conf.int = TRUE,
                         risk.table = TRUE, legend.title = "",
                         surv.median.line = "hv", xlab = "Time", ylab = "Survival Probability")

    # Return the fit and ggsurv as a list
    list(fit = fit, ggsurv = ggsurv)
  })
  
  # Return the list of results
  return(results_list)
}

res_list <- surv_plot_func(df, vars, "time", "status")
res_list[[1]]$ggsurv

res_list[[2]]$ggsurv

Created on 2023-04-12 with reprex v2.0.2

YH Jang
  • 1,306
  • 5
  • 15
  • Error in as.formula(form) : object 'form' not found – Allan Feb 14 '23 at 13:23
  • 1
    There's no point in passing `time` or `status` to that function. Also really bad idea to use a variable named `vars` when working in the tidyverse. There are existing functions by that name. If you drop the two unneeded parameters and rename `vars` to `var_s` you can get it to run properly. – IRTFM Feb 26 '23 at 03:11
  • @IRTFM, following your suggestion, it did work, but only without the ggsurvplot part. With this part I receive the message `Error in as.formula(form) : object 'form' not found` `Called from: as.formula(form)` – Allan Apr 12 '23 at 03:07
  • Try using as.formula in the definition of form. – IRTFM Apr 12 '23 at 04:20
  • Seems like the error is due to `ggsurvplot()`, not `form`. The KM curve is plotted when using `plot(form)`. – YH Jang Apr 12 '23 at 06:28