Given a data frame in R with different columns that could work as dependent variables, I'm trying to create a function that receives the data frame 'df', list or vector with dependent variables 'vars', a time variable 'time' and a status variable 'status' that returns both survival results using 'survfit' and a kaplan-meier curve using ggsurvplot.
The intention is avoiding too much copying and paste code.
Take the data below as an example:
library(ggplot2)
library(survival)
library("dplyr")
df <- lung %>%
transmute(time,
status, # censoring status 1=censored, 2=dead
Age = age,
Sex = factor(sex, labels = c("Male", "Female")),
ECOG = factor(lung$ph.ecog),
`Meal Cal` = as.numeric(meal.cal))
# help(lung)
# Turn status into (0=censored, 1=dead)
df$status <- ifelse(df$status == 2, 1, 0)
I certainly can do survival analyses like this:
fit <- survfit(Surv(time, status) ~ ECOG, data = df)
ggsurvplot(fit,
pval = TRUE, pval.coord = c(750, 0.3),
conf.int = FALSE,
surv.median.line = "hv",
legend = c(0.8, 0.6),
legend.title = "",
risk.table = "absolute",
risk.table.y.text = FALSE,
xlab = "Time (days)", ylab = "Survival",
palette="jco",
title="Overall Survival", font.title = c(16, "bold", "black"),
)
However, I'd have to copy and paste everything again if I want to do the same with Sex. So I'd like to create a function in R that takes as inputs a data frame 'df', a list of dependent variables 'vars', a time variable 'time', and a status variable 'status' and returns both survival results using 'survfit' and a Kaplan-Meier curve using 'ggsurvplot', like the following:
vars <- c("ECOG", "Sex")
surv_plot_func <- function(df, vars, time, status) {
results_list <- lapply(vars, function(var, time, status) {
# Fit a survival model
fit <- survfit(Surv(as.numeric(df[[time]]), as.logical(df[[status]])) ~ as.factor(df[[var]]), data = df)
# Plot the Kaplan-Meier curve using ggsurvplot
ggsurv <- ggsurvplot(fit, pval = TRUE, conf.int = TRUE,
risk.table = TRUE, legend.title = "",
surv.median.line = "hv", xlab = "Time", ylab = "Survival Probability")
# Return the fit and ggsurv as a list
list(fit = fit, ggsurv = ggsurv)
})
# Return the list of results
results_list
}
res_list <- surv_plot_func(df, vars, "time", "status")
However, it didn't work. Any ideas?