mlr3 distrcompose cdf: subscript out of bounds

Question

R version used: 3.6.3, mlr3 version: 0.4.0-9000, mlr3proba version: 0.1.6.9000, mlr3pipelines version: 0.1.2 and xgboost version: 0.90.0.2 (as stated on Rstudio package manager)

I have deployed the following graph pipeline:

imputePipe = PipeOpImputeMean$new(id = "imputemean", param_vals = list())
survXGPipe = mlr_pipeops$get("learner",lrn("surv.xgboost"))

graphXG= Graph$new()$
  add_pipeop(imputePipe)$
  add_pipeop(po("learner", lrn("surv.kaplan")))$
  add_pipeop(survXGPipe)$
  add_pipeop(po("distrcompose"))$
  add_edge("imputemean","surv.kaplan")$
  add_edge("imputemean","surv.xgboost")$
  add_edge("surv.kaplan","distrcompose", dst_channel = "base")$
  add_edge("surv.xgboost","distrcompose", dst_channel = "pred")

Unfortunately upon executing the following commands:

lrnXG = GraphLearner$new(graphXG)
trainResults = glrnXG$train(trainVerTask, row_ids = trainDataInd)
predictionResults = glrnXG$predict(trainVerTask, row_ids = verDataInd)

When the predict function is called the following error is returned:

Error in cdf[i, ] : subscript out of bounds

This error seems to be specific to the distrcompose function since I tried implementing simple graphs using only surv.xgboost, surv.kaplan and it does not show up.

It also seems to be data inspecific since I tried changing the input data and as long as distrcompose is used the same error is returned. Please let me know if you would like me to provide any further information concernign the matter, thank you in advance for your time.

Please use the following code to reproduce the error:

library(mlr3)
library(mlr3pipelines)
library(mlr3proba)
library(mlr3learners)
task = tgen("simsurv")$generate(1000)
imputePipe = PipeOpImputeMean$new(id = "imputemean", param_vals = list())
survXGPipe = mlr_pipeops$get("learner",lrn("surv.xgboost"))

graphXG= Graph$new()$
  add_pipeop(imputePipe)$
  add_pipeop(po("learner", lrn("surv.kaplan")))$
  add_pipeop(survXGPipe)$
  add_pipeop(po("distrcompose"))$
  add_edge("imputemean","surv.kaplan")$
  add_edge("imputemean","surv.xgboost")$
  add_edge("surv.kaplan","distrcompose", dst_channel = "base")$
  add_edge("surv.xgboost","distrcompose", dst_channel = "pred")

lrnXG = GraphLearner$new(graphXG)
trainResults = lrnXG$train(task, row_ids = 1:900)
lrnXG$predict(task, row_ids = 901:1000)

score 1 · Accepted Answer · answered Jul 29 '20 at 08:51

1

The problem lies in distr6 here, please install the latest versions of distr6 (1.4.2) and mlr3proba (0.2.0) from CRAN and then try again.

answered Jul 29 '20 at 08:51

RaphaelS

839
4
14

Thank you so much for your response, I removed my previous mlr3proba and distr6 packages and installed the newest version of mlr3proba from: https://cran.r-project.org/web/packages/mlr3proba/ using the file: mlr3proba_0.2.0.tar.gz. With the completion of installation I had distr(1.4.2) and mlr3proba (0.2.0) unfortunately the out of bounds error persists. – EvangelosK Jul 29 '20 at 09:31
Could you please provide a reprex including the task so I can try to reproduce this – RaphaelS Jul 29 '20 at 10:55
I've run your code with a simulated task and had no problems, so would definitely require a reprex from your side, preferably with sessioninfo using `reprex::reprex`. Then I might open this in GitHub if a genuine bug. – RaphaelS Jul 29 '20 at 11:04
Dear Raphael, thank you so much for your response, I will try to look into creating a reprex though it might a while for me to create an example since unfortunately I am quite new in R. One small not however, in terms of the simulated data you used for the task, would it be possible of trying to replicate the graph using the following data dimensions: 1000 samples for 50 variables, 80/20% Training/Verification split. I noticed that when the verification samples reach an order of magnitude similar to training the error vanishes, but this leads to impractical Training/Verification% – EvangelosK Jul 29 '20 at 12:35
No problem, the simplest thing would just be to create a task similar to yours. When I tried it I used a task of only 20 samples so it's unlikely due to that. I suspect the error is actually due to other package versions. Could you please tell me what versions of `set6` and `mlr3learners` you are using? – RaphaelS Jul 29 '20 at 14:00
Let's bring this to GitHub to examine this properly: https://github.com/mlr-org/mlr3proba/issues/144. I will need a reprex at the minimum to solve this, there's an example on that issue showing one that I've made – RaphaelS Jul 30 '20 at 10:58
Dear Raphael, sorry for responding here, it seems I have temporary log in issue with git hub, however I made a small change to your simulation code and managed to reproduce the exact same error. The reproduction code is now added at the end of the question description of this post. – EvangelosK Jul 30 '20 at 12:03
Thanks very much, bug is found and fixed. Please just install latest version with `remotes::install_github("mlr-org/mlr3proba")` and mark this as answered if it works for you. – RaphaelS Jul 30 '20 at 15:04
1

Dear Raphael, thank you so much for fixing the bug, I can confirm that it now works. – EvangelosK Jul 30 '20 at 16:49

mlr3 distrcompose cdf: subscript out of bounds

1 Answers1

Linked