0

The sample set we used for modeling is sampled from a large whole dataset. Usually when we use logistic regression for scorecard model, we will consider the change of the ratio of good to bad sample between sample set and whole dataset as factor or weight.

I think the factor will affect the split of tree when we use lightGBM if we consider the model to be applied to the whole dataset. Like we have a feature called "age". If we don't consider weight, it should split on 32, but the weight tells that in the whole dataset or the population, more sample is below 25, and the split should be 28.

I found two possible way in lightGBM: set is_unbalance to False and set scale_pos_weight as our weight, or set is_unbalance to True and pass our weight as array to parameter "sample_weight" in fit function.

I don't know which one is valid and can achieve our goal to consider sample weight in lightGBM

Kai Wang
  • 53
  • 1
  • 6

0 Answers0