6

I am interested in how sklearn apply the class weight we supply. The documentation doesn't state explicitly where and how the class weights are applied. Nor does reading the source code helps (seems like sklearn.svm.liblinear is used for the optimization, and I can't read the source codes since it is a .pyd file...)

But I guess it works on the cost function: when class weights are specified, the cost of the respective class will be multiplied by the class weight. For example if I have 2 observations each from class 0 (weight=0.5) and class 1 (weight=1) respectively, then the cost function would be:

Cost = 0.5*log(...X_0,y_0...) + 1*log(...X_1,y_1...) + penalization

Does anyone know whether this is correct?

Rishav
  • 3,818
  • 1
  • 31
  • 49
lizardfireman
  • 329
  • 3
  • 17

1 Answers1

2

Check the following lines in the source code:

le = LabelEncoder()
if isinstance(class_weight, dict) or multi_class == 'multinomial':
    class_weight_ = compute_class_weight(class_weight, classes, y)
    sample_weight *= class_weight_[le.fit_transform(y)]

Here is the source code for the compute_class_weight() function:

...
else:
    # user-defined dictionary
    weight = np.ones(classes.shape[0], dtype=np.float64, order='C')
    if not isinstance(class_weight, dict):
        raise ValueError("class_weight must be dict, 'balanced', or None,"
                         " got: %r" % class_weight)
    for c in class_weight:
        i = np.searchsorted(classes, c)
        if i >= len(classes) or classes[i] != c:
            raise ValueError("Class label {} not present.".format(c))
        else:
            weight[i] = class_weight[c]
...

In the snippet above class_weight are applied to sample_weight, which is used in a few internal function like _logistic_loss_and_grad, _logistic_loss, etc.:

# Logistic loss is the negative of the log of the logistic function.
out = -np.sum(sample_weight * log_logistic(yz)) + .5 * alpha * np.dot(w, w)
# NOTE: --->  ^^^^^^^^^^^^^^^
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419