1

I am currently trying to work with the new xgboostExplainer package.

I am following the githib page here https://github.com/AppliedDataSciencePartners/xgboostExplainer/blob/master/R/explainPredictions.R

on line 34, the xgboost model is ran:

xgb.model <- xgboost(param =param, data = xgb.train.data, nrounds=3)

However on line 43 I am running into some problems.

explainer = buildExplainer(xgb.model,xgb.train.data, type="binary", base_score = 0.5, n_first_tree = xgb.model$best_ntreelimit - 1)

I understand that n_first_tree is depreciated but I cannot seem to access the xgb.model$best_ntreelimit -1 part.

The sections I can access in xgboost are;

handle, raw, niter, evaluation_log, call, params, callbacks, feature_names

not best_ntreelimit

Has somebody else ran into this issue.

EDIT:

Output of the showWaterfall()

Extracting the breakdown of each prediction...
  |=============================================================| 100%

DONE!

Prediction:  NA
Weight:  NA
Breakdown
                       intercept                   cap-shape=bell 
                              NA                               NA 
               cap-shape=conical                 cap-shape=convex 
                              NA                               NA 
                  cap-shape=flat                cap-shape=knobbed 
                              NA                               NA 
                cap-shape=sunken              cap-surface=fibrous 
                              NA                               NA 
             cap-surface=grooves                cap-surface=scaly 
                              NA                               NA 
              cap-surface=smooth                  cap-color=brown 
                              NA                               NA 
                  cap-color=buff               cap-color=cinnamon 
                              NA                               NA 
                  cap-color=gray                  cap-color=green 
                              NA                               NA 
                  cap-color=pink                 cap-color=purple 
                              NA                               NA 
                   cap-color=red                  cap-color=white 
                              NA                               NA 
                cap-color=yellow                 bruises?=bruises 
                              NA                               NA 
                     bruises?=no                      odor=almond 
                              NA                               NA 
                      odor=anise                    odor=creosote 
                              NA                               NA 
                      odor=fishy                        odor=foul 
                              NA                               NA 
                      odor=musty                        odor=none 
                              NA                               NA 
                    odor=pungent                       odor=spicy 
                              NA                               NA 
        gill-attachment=attached       gill-attachment=descending 
                              NA                               NA 
            gill-attachment=free          gill-attachment=notched 
                              NA                               NA 
              gill-spacing=close             gill-spacing=crowded 
                              NA                               NA 
            gill-spacing=distant                  gill-size=broad 
                              NA                               NA 
                gill-size=narrow                 gill-color=black 
                              NA                               NA 
                gill-color=brown                  gill-color=buff 
                              NA                               NA 
            gill-color=chocolate                  gill-color=gray 
                              NA                               NA 
                gill-color=green                gill-color=orange 
                              NA                               NA 
                 gill-color=pink                gill-color=purple 
                              NA                               NA 
                  gill-color=red                 gill-color=white 
                              NA                               NA 
               gill-color=yellow            stalk-shape=enlarging 
                              NA                               NA 
            stalk-shape=tapering               stalk-root=bulbous 
                              NA                               NA 
                 stalk-root=club                   stalk-root=cup 
                              NA                               NA 
                stalk-root=equal           stalk-root=rhizomorphs 
                              NA                               NA 
               stalk-root=rooted               stalk-root=missing 
                              NA                               NA 
stalk-surface-above-ring=fibrous   stalk-surface-above-ring=scaly 
                              NA                               NA 
  stalk-surface-above-ring=silky  stalk-surface-above-ring=smooth 
                              NA                               NA 
stalk-surface-below-ring=fibrous   stalk-surface-below-ring=scaly 
                              NA                               NA 
  stalk-surface-below-ring=silky  stalk-surface-below-ring=smooth 
                              NA                               NA 
    stalk-color-above-ring=brown      stalk-color-above-ring=buff 
                              NA                               NA 
 stalk-color-above-ring=cinnamon      stalk-color-above-ring=gray 
                              NA                               NA 
   stalk-color-above-ring=orange      stalk-color-above-ring=pink 
                              NA                               NA 
      stalk-color-above-ring=red     stalk-color-above-ring=white 
                              NA                               NA 
   stalk-color-above-ring=yellow     stalk-color-below-ring=brown 
                              NA                               NA 
     stalk-color-below-ring=buff  stalk-color-below-ring=cinnamon 
                              NA                               NA 
     stalk-color-below-ring=gray    stalk-color-below-ring=orange 
                              NA                               NA 
     stalk-color-below-ring=pink       stalk-color-below-ring=red 
                              NA                               NA 
    stalk-color-below-ring=white    stalk-color-below-ring=yellow 
                              NA                               NA 
               veil-type=partial              veil-type=universal 
                              NA                               NA 
                veil-color=brown                veil-color=orange 
                              NA                               NA 
                veil-color=white                veil-color=yellow 
                              NA                               NA 
                ring-number=none                  ring-number=one 
                              NA                               NA 
                 ring-number=two               ring-type=cobwebby 
                              NA                               NA 
            ring-type=evanescent                ring-type=flaring 
                              NA                               NA 
                 ring-type=large                   ring-type=none 
                              NA                               NA 
               ring-type=pendant              ring-type=sheathing 
                              NA                               NA 
                  ring-type=zone          spore-print-color=black 
                              NA                               NA 
         spore-print-color=brown           spore-print-color=buff 
                              NA                               NA 
     spore-print-color=chocolate          spore-print-color=green 
                              NA                               NA 
        spore-print-color=orange         spore-print-color=purple 
                              NA                               NA 
         spore-print-color=white         spore-print-color=yellow 
                              NA                               NA 
             population=abundant             population=clustered 
                              NA                               NA 
             population=numerous             population=scattered 
                              NA                               NA 
              population=several              population=solitary 
                              NA                               NA 
                 habitat=grasses                   habitat=leaves 
                              NA                               NA 
                 habitat=meadows                    habitat=paths 
                              NA                               NA 
                   habitat=urban                    habitat=waste 
                              NA                               NA 
                   habitat=woods 
                              NA 
-3.89182 -3.178054 -2.751535 -2.442347 -2.197225 -1.99243 -1.81529 -1.658228 -1.516347 -1.386294 -1.265666 -1.15268 -1.045969 -0.9444616 -0.8472979 -0.7537718 -0.6632942 -0.5753641 -0.4895482 -0.4054651 -0.3227734 -0.2411621 -0.1603427 -0.08004271 0 0.08004271 0.1603427 0.2411621 0.3227734 0.4054651 0.4895482 0.5753641 0.6632942 0.7537718 0.8472979 0.9444616 1.045969 1.15268 1.265666 1.386294 1.516347 1.658228 1.81529 1.99243 2.197225 2.442347 2.751535 3.178054 3.89182
Error in if (abs(values[i]) > put_rect_text_outside_when_value_below) { : 
  missing value where TRUE/FALSE needed

EDIT: Here is the code I ran:

library(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test

xgb.train.data <- xgb.DMatrix(train$data, label = train$label)
xgb.test.data <- xgb.DMatrix(test$data, label = test$label)
param <- list(objective = "binary:logistic")

model.cv <- xgb.cv(param = param,
                   data = xgb.train.data,
                   nrounds = 500,
                   early_stopping_rounds = 10,
                   nfold = 3)

model.cv$best_ntreelimit

xgb.model <- xgboost(param =param,  data = xgb.train.data, nrounds = 10)

explained <- buildExplainer(xgb.model, xgb.train.data, type="binary", base_score = 0.5, n_first_tree = 9)

pred.breakdown = explainPredictions(xgb.model,
                                    explained,
                                    xgb.test.data)

showWaterfall(xgb.model,
              explained,
              xgb.test.data, test$data,  2, type = "binary")
user113156
  • 6,761
  • 5
  • 35
  • 81

1 Answers1

1

I tested the code in the linked page. best_ntreelimit is a parameter returned by xgb.cv when early_stopping_rounds is set. From the help of xgb.cv:

best_ntreelimit the ntreelimit value corresponding to the best iteration, which could further be used in predict method (only available with early stopping).

You can get to it by using xgb.cv:

library(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test

xgb.train.data <- xgb.DMatrix(train$data, label = train$label)
param <- list(objective = "binary:logistic")

model.cv <- xgb.cv(param = param,
                   data = xgb.train.data,
                   nrounds = 500,
                   early_stopping_rounds = 10,
                   nfold = 3)

model.cv$best_ntreelimit
#output
9

However output of xgb.cv can not be used to build an explainer.

So you need:

xgb.model <- xgboost(param =param,  data = xgb.train.data, nrounds = 10)

and set the n_first_tree to an integer:

explained <- buildExplainer(xgb.model, xgb.train.data, type="binary", base_score = 0.5, n_first_tree = 9)

EDIT: I failed to paste the following code: xgb.test.data <- xgb.DMatrix(test$data, label = test$label)

pred.breakdown = explainPredictions(xgb.model,
                                    explained,
                                    xgb.test.data)

and now you can do:

showWaterfall(xgb.model,
          explained,
          xgb.test.data, test$data,  2, type = "binary")

enter image description here

missuse
  • 19,056
  • 3
  • 25
  • 47
  • Hi, thanks for the clearer explanation. I have followed your code, but when I run the `showWaterfall()` I run into the following error: `Error in if (abs(values[i]) > put_rect_text_outside_when_value_below) { : missing value where TRUE/FALSE needed` which is a result of the data having only `NA` values. I paste the output in the original post. – user113156 Mar 22 '18 at 14:41
  • @user113156 Hey, check edit please, it is a problem on my part I forgot to paste one line of the code from my session. I apologize. Please check if working now? If not try to update to latest packages. – missuse Mar 22 '18 at 15:05
  • Hi apologies in the delayed response, I tried to go back to this issue I was having and it is stil giving me the same error. I paste the code in the original comment to show exactly what I am doing. – user113156 Apr 02 '18 at 14:19
  • The code you posted runs fine for me: `R version 3.4.2 (2017-09-28) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) other attached packages: [1] xgboost_0.6.4.1 xgboostExplainer_0.1` – missuse Apr 02 '18 at 14:24
  • Strange: `R version 3.4.1 Patched (2017-08-30 r73162) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)` – user113156 Apr 02 '18 at 14:27
  • and package versions? – missuse Apr 02 '18 at 14:37
  • Currently updating my R version to the latest, will get the package versions now – user113156 Apr 02 '18 at 14:42
  • `other attached packages: [1] xgboostExplainer_0.1 xgboost_0.6.4.6 RDocumentation_0.8.0` – user113156 Apr 02 '18 at 14:49
  • I have had problems with R and java before where I had to force change the version from 64bit to 32bit etc. Could this be something occuring again? – user113156 Apr 02 '18 at 14:51
  • Do you get the following?; `sum(is.na(pred.breakdown)) [1] 204597 ` – user113156 Apr 02 '18 at 15:06
  • Hi, thanks @missuse! I just went back through this problem again and re-installed my `xgboost` package to the same version as you had - version `0.6.4.1` and it works! I had the updated `xgboost` version from `github` - version `0.6.4.6` not the version on `cran`. – user113156 Apr 12 '18 at 18:19