4

Forward that I'm fairly new to both xgboost and R.

I am using xgboost in R to perform a multinomial classification on my data dtrain. The label I am using has six levels, so my code looks like this:

param1 <- list(objective = "multi:softprob"
          , num_class = 6
          , booster = "gbtree"  
          , eta = 0.5
          , max.depth = 7
          , min_child_weight = 10
          , max_delta_step = 5
          , subsample = 0.8
          , colsample_bytree = 0.8
          , lambda  = 3 # L2
          , alpha = 5 # L1
)
set.seed(2016)    
xgbcv1 <- xgb.cv(params = param1, data = dtrain, nround = 3000, nfold = 3,
             metrics = list("error", "auc"), maximize = T, 
             print_every_n = 10, early_stopping_rounds = 10)

This throws me the following error:

Error in xgb.iter.update(fd$bst, fd$dtrain, iteration - 1, obj) : 
amalgamation/../src/objective/multiclass_obj.cc:75: Check failed: 
label_error >= 0 && label_error < nclass SoftmaxMultiClassObj: label must be in [0, num_class), num_class=6 but found 6 in label.

So I tried setting num_class = 7, which throws this error:

Error in xgb.iter.eval(fd$bst, fd$watchlist, iteration - 1, feval) : 
amalgamation/../src/metric/elementwise_metric.cc:28: Check failed: 
(preds.size()) == (info.labels.size()) label and prediction size not match, hint: use merror or mlogloss for multi-class classification

What's going on here? Does num_class need to be greater than label_error or equal to it?

data princess
  • 1,130
  • 1
  • 23
  • 42
  • pls use `dput()`to share your data,... – Tonio Liebrand Aug 24 '17 at 18:18
  • Did you overcome it in the end? I found a solution over which worked for me. Specifically for my y variable, `set y <- y - 1` (I ddn't see a target variable in your call to xgb) https://stackoverflow.com/questions/36086529/understanding-num-classes-for-xgboost-in-r – Doug Fir Sep 05 '17 at 10:25

3 Answers3

2

The XGboost algorithm requires that class labels start from 0 and increase sequentially to the maximum number of classes. This is a bit of an inconvenience as you need to keep track of what Class name goes with which label.

Convert your Class target variable to numeric and subtract it with 1.

df$class_numeric<-as.numeric(df$class_target)
df<-df%>%mutate(class_numeric=class_numeric-1)
  • Key point here! no negative classes, just spent a few hours looking for the issue with classes labelled -1,0,1 – Mayeul sgc Sep 18 '19 at 14:46
-1

if number of levels in the dependent variable is 6 then give num_class = 7. Meaning specify num_class = levels(Dependent Variable) + 1

Vasu
  • 1
-1

try :

set metrics = list("mlogloss")
Dadep
  • 2,796
  • 5
  • 27
  • 40
  • 1
    Welcome to Stack Overflow! Please don't answer just with source code. Try to provide a nice description about how your solution works. See: [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). Thanks – sɐunıɔןɐqɐp Sep 25 '18 at 07:21