0

I am trying to make a forrest plot for my model with ggforest(). Here is the code to create mock data to reproduce the problem. Data is formatted according to Therneau for time dependent covariates. I guess this might be the reason why ggforest does not operate properly.

library(survival)
library(survminer)

set.seed(1)
repetitions<-floor(sample(rnorm(1:10, 10)))
id<-rep(1:10, times=repetitions )
age<-rep(floor(sample(18:80,10)),times=repetitions)
diabetes<-rep(sample(0:1,10,replace=TRUE), times=repetitions)
bil<-sample(4:60,length(id), replace=TRUE)


status<-rep(1,length(id))
indices<-vector(length=10)
for(i in 1:10){
       indices[i]<-sum(repetitions[1:i])
   }
status[indices]<-2

daystart <- vector()
a<-vector()
for(i in 1:10){
  if(i==1){ daystart<-1:indices[i]
  } else {a<-1:(indices[i]-indices[i-1])
  }
  daystart<-c(daystart,a)
}

dayend<-daystart+1

mock_data<-cbind.data.frame(id,age,diabetes, bil, daystart, dayend, status)
mock_data$agegroup<-cut(mock_data$age, 2)



fit2<-coxph(Surv(daystart,dayend, status)~bil+diabetes+strata(agegroup), data=mock_data)
ggforest(fit2 , data=mock_data)

I get

Error in [.data.frame(data, ,var ) : undefined columns selected.

I tried installing previous version of package broom ( version 0.5.6) as, as suggested in previous threads, but it didnt resolve the issue. R versions 3.6.1 and 4.1.1 were used. Any ideas?

EDIT: So, the ggforest() gets confused with +strata(). Removing +strata() produces a plot.

1 Answers1

0

So, the problem was in this row in ggforest() function.

terms <- attr(model$terms, "dataClasses")[-1]

I just did the quick fix and copy-pasted the body of the function in the new function I created, adding index "-4", in order not to add strata attributes to terms.

I guess the original function might be changed to accomodate this and exclude strata from terms, but I should stress that I am not great at math or statistics, so I am not 100 % sure if stratifying the data for time-varying cox proportional hazards analysis is valid if I stratify by continuous variable such as age. That would end up with each strata containing the data for only several individuals with the same age, each having repeated measurements values.