0

Using Python SkLearn Gradient Boost Classifier. The setting I am using is selecting random samples (stochastic). Using the sample_weight of 1 for one of the binary classes (outcome = 0) and 20 for the other class (outcome = 1). My question is how are these weights applied in 'laymans terms'.

Is it that at each iteration, the model will select x rows from the sample for the 0 outcome, and y rows for the 1 outcome, then the sample_weight setting will kick into and keep all of x but oversample the y (1) outcome by a factor of 20?

In the documentation I am not clear if it is oversampling by having sample_weight > 1. I understand that class_weight is different and does not change the data but how the model interprets the data via the loss function. Sample_weight on the other hand, is it true that it effectively changes the data fed into the model by oversampling?

Thanks

1 Answers1

0

Sample weights are a multiplier factor, here is the code:

https://github.com/scikit-learn/scikit-learn/blob/f0ab589f/sklearn/ensemble/gradient_boosting.py#L1225

Alessandro
  • 845
  • 11
  • 21
  • Thanks, by multiplier factor are you confirming that sample_weight is modifying how the algorithm penalizes errors made on that class, as opposed to feeding more data into the trees by oversampling from that class. If you are able to please highlight a few code examples that demonstrates this that would be highly appreciated. – Luke Boston Aug 31 '18 at 00:08