5

I am working on a binary classification problem in LightGbm (Scikit-learn API), and have a problem understanding how to include sample weights. My code currently looks like this

classifier = LGBMClassifier(n_estimators=100, learning_rate = 0.1, num_leaves = 15)    
classifier.fit(X_train, y_train, sample_weight = w_train, eval_set = (X_val, y_val))

Here w_train is a numpy array with the same dimension as y_train. But I need LightGbm to also use sample_weights on the validation set, so I set eval_sample_weight in the fit function. I expected this to also be an array w_val (with the same dimension as y_val), but I see from the documentation that this is a list of arrays. I can not find any examples using this, so I struggle to understand why. To my understanding, this should just be a weight for each element in the validation set. A list of arrays: would this mean multiple weights for each sample? Can anyone explain?

Petter T
  • 3,387
  • 2
  • 19
  • 31

2 Answers2

6

Figured this out myself. LightGbm accepts a list of validation sets. So, it also of course accepts a list of weights. One set of weights for each validation set.

Petter T
  • 3,387
  • 2
  • 19
  • 31
  • Do you know if the weights need to follow any constraints or they can be any +ve number as long as their ratio reflects what we need? (My guessing only the ratio matters since the loss is additive.) – abhgh Feb 28 '19 at 03:23
  • 1
    Hi, I have the same problem here and can you share the code on how you solve it? Thank you. – Shenan Sep 05 '19 at 19:55
1

Petter T is right, it should be a list of arrays, each array corresponding to each eval_set. So the code should look like following:

classifier.fit(X_train, y_train, sample_weight = w_train, eval_set = (X_val, y_val), eval_sample_weight = [w_val])

where, as described before, w_val shape is equal to shape of y_val.

Rafa
  • 564
  • 4
  • 12
  • And in the cross-validation scenario, `eval_sample_weight` needs to be passed the validation part of each fold's data. – mirekphd Aug 30 '21 at 14:18
  • BTW, XGBoost equivalent of `eval_sample_weight` is called `sample_weight_eval_set`. – mirekphd Aug 30 '21 at 14:19