I am currently struggling with recursiving partition and bagging/bootstrapping of some data. As the data is confidential I have provided a reproducible example using the "GBSG2" data. In essense I am currently trying to reproduce an article recently published in Journal of Clinical Oncology (https://ascopubs.org/doi/abs/10.1200/JCO.22.02222) with my own data on an identical patient population.
I have attached prints of their method section and a supplemental tabel which is essentially what I hope to end up with
My problem can be boiled down to
- I want to for each end-node extract the three year survival rate and then specify for each patient which group they belong to - group A >70%, B; 70-50, C; 50-25 and D less than 25%.
- When bootstrapping afterwards the same needs to happen so I can see for each iteration which group a specific patient was allocated to and how often this happened.
Here is a some dummy code and what I've done thus far
library(partykit)
data("GBSG2", package = "TH.data")
#Dataframe
df <- GBSG2
#Ctree object
stree <- ctree(Surv(time,cens)~., data=df, control= ctree_control(minsplit = 50, alpha = 0.1, multiway = T))
#The following part I hope could be done more efficiently
n <- predict(stree, type="node")
nd <- factor(predict(stree, type="node"))
df$node <- n
fit1 <- survfit(Surv(time,cens)~nd, data=df)
summary(fit1, times=365*3)
#Manual input to each node by reading the transcript
df$grp <- ifelse(df$node==3, "A",NA)
df$grp <- ifelse(df$node==4, "A", df$grp)
df$grp <- ifelse(df$node==7, "C", df$grp)
df$grp <- ifelse(df$node==8, "D", df$grp)
df$grp <- ifelse(df$node==9, "B", df$grp)
I believe the above needs to be fixed before my bootstrap can be done in order to get a result which matches the attached supplemental table (I'd like to do it 1000 times, but I'm doing 10 until it works).
#Bagging
df_bag <- df %>%
select(-"node", -"grp")
cf <- cforest(Surv(time,cens)~.,data=df_bag, ntree=10, mtry = Inf)
Thank you very much,
Tobias Berg