xgboost multinomial classification error: "label and prediction size not match"

Question

Forward that I'm fairly new to both xgboost and R.

I am using xgboost in R to perform a multinomial classification on my data dtrain. The label I am using has six levels, so my code looks like this:

param1 <- list(objective = "multi:softprob"
          , num_class = 6
          , booster = "gbtree"  
          , eta = 0.5
          , max.depth = 7
          , min_child_weight = 10
          , max_delta_step = 5
          , subsample = 0.8
          , colsample_bytree = 0.8
          , lambda  = 3 # L2
          , alpha = 5 # L1
)
set.seed(2016)    
xgbcv1 <- xgb.cv(params = param1, data = dtrain, nround = 3000, nfold = 3,
             metrics = list("error", "auc"), maximize = T, 
             print_every_n = 10, early_stopping_rounds = 10)

This throws me the following error:

Error in xgb.iter.update(fd$bst, fd$dtrain, iteration - 1, obj) : 
amalgamation/../src/objective/multiclass_obj.cc:75: Check failed: 
label_error >= 0 && label_error < nclass SoftmaxMultiClassObj: label must be in [0, num_class), num_class=6 but found 6 in label.

So I tried setting num_class = 7, which throws this error:

Error in xgb.iter.eval(fd$bst, fd$watchlist, iteration - 1, feval) : 
amalgamation/../src/metric/elementwise_metric.cc:28: Check failed: 
(preds.size()) == (info.labels.size()) label and prediction size not match, hint: use merror or mlogloss for multi-class classification

What's going on here? Does num_class need to be greater than label_error or equal to it?

Did you overcome it in the end? I found a solution over which worked for me. Specifically for my y variable, `set y <- y - 1` (I ddn't see a target variable in your call to xgb) https://stackoverflow.com/questions/36086529/understanding-num-classes-for-xgboost-in-r — Doug Fir, Sep 05 '17 at 10:25

score 2 · Answer 1 · answered Mar 29 '19 at 09:53

2

The XGboost algorithm requires that class labels start from 0 and increase sequentially to the maximum number of classes. This is a bit of an inconvenience as you need to keep track of what Class name goes with which label.

Convert your Class target variable to numeric and subtract it with 1.

df$class_numeric<-as.numeric(df$class_target)
df<-df%>%mutate(class_numeric=class_numeric-1)

answered Mar 29 '19 at 09:53

Satish Chilloji

56
3

Key point here! no negative classes, just spent a few hours looking for the issue with classes labelled -1,0,1 – Mayeul sgc Sep 18 '19 at 14:46

score -1 · Answer 2 · answered Nov 14 '17 at 12:08

-1

if number of levels in the dependent variable is 6 then give num_class = 7. Meaning specify num_class = levels(Dependent Variable) + 1

answered Nov 14 '17 at 12:08

Vasu

1

score -1 · Answer 3 · edited Sep 25 '18 at 07:58

-1

try :

set metrics = list("mlogloss")

edited Sep 25 '18 at 07:58

Dadep

2,796
5
27
40

answered Sep 25 '18 at 07:05

user10411737

1

1

Welcome to Stack Overflow! Please don't answer just with source code. Try to provide a nice description about how your solution works. See: [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). Thanks – sɐunıɔןɐqɐp Sep 25 '18 at 07:21

xgboost multinomial classification error: "label and prediction size not match"

3 Answers3