3

I am running a classification xgboost via the mlr package. I have missing values in my data, which I would like to preserve (that is, I would like to keep these observations and I would like to avoid imputation). I understand that the xgboost implementation in mlr can handle missing values. However, I do not understand the warning provided by mlr's makeLearner function.

I have tried to read the documentation and have found this warning throughout other people's code. But I have not seen the warning addressed in a way that makes sense to me.

For example, I have read this discussion of the warning but it did not clarify things for me: https://github.com/mlr-org/mlr/pull/1225

The warning appears when calling the makeLearner function:

xgb_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "prob",
  par.vals = list(
    objective = "binary:logistic",
    eval_metric = "error",
    nrounds = 200,
    missing = NA,
    max_depth = 6,
    eta = 0.1,
    gamma = 5,
    colsample_bytree = 0.5,
    min_child_weight = 1,
    subsample = 0.7

  )
)
Warning in makeParam(id = id, type = "numeric", learner.param = TRUE, lower = lower,  :
  NA used as a default value for learner parameter missing.
ParamHelpers uses NA as a special value for dependent parameters.

My missing values are currently coded as missing values (ie, NA). It is clear that R recognizes them as such from:

> sum(is.na(training$day))
[1] 58

From the getParamSet function, it seems that the parameter missing takes numeric values from -Inf to Inf. Thus, perhaps NA is not a valid value?

> getParamSet("classif.xgboost")
Warning in makeParam(id = id, type = "numeric", learner.param = TRUE, lower = lower,  :
  NA used as a default value for learner parameter missing.
ParamHelpers uses NA as a special value for dependent parameters.
                                Type  len             Def               Constr Req Tunable Trafo
booster                     discrete    -          gbtree gbtree,gblinear,dart   -    TRUE     -
watchlist                    untyped    -          <NULL>                    -   -   FALSE     -
eta                          numeric    -             0.3               0 to 1   -    TRUE     -
gamma                        numeric    -               0             0 to Inf   -    TRUE     -
max_depth                    integer    -               6             1 to Inf   -    TRUE     -
min_child_weight             numeric    -               1             0 to Inf   -    TRUE     -
subsample                    numeric    -               1               0 to 1   -    TRUE     -
colsample_bytree             numeric    -               1               0 to 1   -    TRUE     -
colsample_bylevel            numeric    -               1               0 to 1   -    TRUE     -
num_parallel_tree            integer    -               1             1 to Inf   -    TRUE     -
lambda                       numeric    -               1             0 to Inf   -    TRUE     -
lambda_bias                  numeric    -               0             0 to Inf   -    TRUE     -
alpha                        numeric    -               0             0 to Inf   -    TRUE     -
objective                    untyped    - binary:logistic                    -   -   FALSE     -
eval_metric                  untyped    -           error                    -   -   FALSE     -
base_score                   numeric    -             0.5          -Inf to Inf   -   FALSE     -
max_delta_step               numeric    -               0             0 to Inf   -    TRUE     -
missing                      numeric    -                          -Inf to Inf   -   FALSE     -

Do I need to recode these as a specific value that I then pass to mlr (through missing = [specific value] in makeLearner)? Do something else? Or is this warning not a cause for concern?

Thanks so very much for any clarification.

PBB
  • 131
  • 1
  • 7

1 Answers1

3

This warning comes from ParamHelpers and is harmless in this case. It's a standard check that doesn't take the particular case into account.

Lars Kotthoff
  • 107,425
  • 16
  • 204
  • 204
  • 1
    Would you mind expanding on what it is trying to tell me? In what case would it be of concern? – PBB Apr 07 '19 at 18:52
  • 1
    In many cases, NA may not be a sensible default value but just added because the author of the code doesn't know better. This is not the case here. Even then, you could argue whether you really need the check. – Lars Kotthoff Apr 07 '19 at 19:01
  • perhaps is the way `ParamHelpers` deal with `missing` parameter of `xgboost`, which by default is `NA`. I´ve tried to set another value to it, but the warning is still there. – xm1 Jul 17 '19 at 18:36