0

I have a great number of events to analyse, all in separate columns as shown in my example below. I would like to create a model for each column labeled IDXX, through a FOR loop, in R.

However, the code always returns the error Time and status are different lengths.

library(survival)
library(survminer)
data = read.csv("data.csv")

# dummy data.csv
#ID  Sex  Age ID1 ID2 ID3 ID4 Time Xfactor
#1    1    55   0   1   0   0   1     12
#2    2    56   0   0   0   0   2     13
#3    2    61   0   0   0   1   3      1
#4    2    62   0   0   1   0   4      3
#5    1    40   0   0   0   0   1      4
 
#   time = time to death
#   event/death = a value of 1 in the individual IDs. Each ID is to be investigated separately, i.e., a different model has to be generated for each ID.
list <- c("ID1", "ID2", "ID3", "ID4")
covariates <- c("Sex", "Age")

for (i in 1:4){
univ_formulas <- sapply(covariates,
                        function(x) as.formula(paste('Surv(Time, ', list[i], ')~', x)))
                        
univ_models <- lapply( univ_formulas, function(x){coxph(x, data = data)})

univ_results <- lapply(univ_models,
                       function(x){ 
                          x <- summary(x)
                          p.value<-signif(x$wald["pvalue"], digits=2)
                          wald.test<-signif(x$wald["test"], digits=2)
                          beta<-signif(x$coef[1], digits=2);#coeficient beta
                          HR <-signif(x$coef[2], digits=2);#exp(beta)
                          HR.confint.lower <- signif(x$conf.int[,"lower .95"], 2)
                          HR.confint.upper <- signif(x$conf.int[,"upper .95"],2)
                          HR <- paste0(HR, " (", 
                                       HR.confint.lower, "-", HR.confint.upper, ")")
                          res<-c(beta, HR, wald.test, p.value)
                          names(res)<-c("beta", "HR (95% CI for HR)", "wald.test", 
                                        "p.value")
                          return(res)
                         })
res <- t(as.data.frame(univ_results, check.names = FALSE))
as.data.frame(res)

res.cox <- coxph(Surv(Time, **list[i]**) ~ Sex + Age + Xfactor, data = data)

summary(res.cox)

survivalsummary <- summary(res.cox)
csvexport <- paste(list[i], ".csv")
write.csv(survivalsummary$coefficients, csvexport)

plotoutput <- ggsurvplot(survfit(res.cox, data = data), palette = "#2E9FDF")
p_plot <- plotoutput$plot
pdfexport <- paste(list[i], ".pdf")
ggsave(pdfexport, device = "pdf")
}

I have a feeling the problematic list[i] as bolded (list[i], I realised the site doesn't parse bolded mono text, but my code does not have the asterisks I've put up here to indicate this) is causing the problem. How should I solve this? I've trawled StackOverflow and other sites to no success at all.

pyyyour
  • 1
  • 2
  • 3
    You can't insert a variable into a formula like that. Construct the correct formula as a string (e.g. using `paste` or `glue`) and then convert to formula using `as.formula`. – Axeman Apr 04 '23 at 20:19
  • Actually, you already do that correctly earlier, when defining `univ_formulas`. But when you create `res.cox` you need to do the same. – Axeman Apr 04 '23 at 20:20
  • Axeman, thank you for the tip, I did not know that. Yes, I did realise that it worked for the `univ_formulas` but I am unable to replicate this for the `res.cox`. Do you have any suggestions? – pyyyour Apr 04 '23 at 20:23
  • 1
    `res.cox <- coxph(Surv(Time, **list[i]**) ~ Sex + Age + Xfactor, data = data)` should be something like `res.cox <- coxph(as.formula(paste("Surv(Time,", list[i], ") ~ Sex + Age + Xfactor")), data = data)`. – Axeman Apr 04 '23 at 20:43
  • @Axeman thank you so much! That worked like a charm. I tried my best to come up with something similar but my attempt did not work. I will try my best to understand how you got to your statement. – pyyyour Apr 04 '23 at 20:55
  • For more insight, run the code step-by-step. Compare for example `y ~ list[i]` with `paste('y ~', list[i])` and `as.formula(paste('y ~', list[i]))`. Perhaps write it out in multiple steps, like `my_form <- as.formula(paste("Surv(Time,", list[i], ") ~ Sex + Age + Xfactor"))` and then `res.cox <- coxph(my_form, data = data)`. Have a look at the examples of `?as.formula`. – Axeman Apr 04 '23 at 21:09

0 Answers0