I'm looking for a faster way to extract predicted survival distributions with mlr3
and mlr3proba
.
The prediction procedure is highly time-consuming, expecially using datasets with hundreds of observations and without ties in time variable.
Does it exist any option to estimate not the entire individual distribution at each time but only at pre-defined ones?
If it would not be possible, is there some option similar to ntimes in [randomForestSRC::rfsrc][1]
?
Here an example using survivalmodels::akritas
, in which the estimation at 1 time point lasts about 10 minutes:
pacman::p_load("survival","mltools","paradox","mlr3misc","mlr3tuning",
"devtools","mlr3extralearners","mlr3proba","mlr3learners",
"survivalmodels","mlr3pipelines", "tictoc", "casebase","distr6")
dat <- survival::rotterdam[,-c(1,2,12,13)]
length(unique(dat$dtime)) # 2215 unique times
set.seed(220311)
sample.train <- sample(nrow(dat), nrow(dat)*.2)
dat_train <- dat[sample.train, ]
length(unique(dat_train$dtime)) # 558 unique times
sample.test <- c(1:nrow(dat))[which(!c(1:nrow(dat)) %in% sample.train)]
dat_test <- dat[sample.test, ]
length(unique(dat_test$dtime)) # 1875 unique times
task = mlr3proba::TaskSurv$new(id = "dat_train", backend = dat_train,
time = "dtime", event = "death")
search_space <- ps(
lambda = p_dbl(lower = 0, upper = 0.25))
learner.dh <- lrn("surv.akritas", reverse=F)
learner.dh$encapsulate = c(train = "evaluate")
at <- AutoTuner$new(
learner = learner.dh,
search_space = search_space,
resampling = rsmp("cv", folds = 5),
measure = msr("surv.cindex"),
terminator = trm("evals", n_evals = 10), #nevals very low, just for example
tuner = tnr("random_search")
)
tic()
at$train(task)
toc() #807.46 sec elapsed
tic()
pred.S_t2638 <- 1 - as.numeric(at$predict_newdata(dat_test)$distr$cdf(2638))
toc() #559.5 sec elapsed