How does TensorFlow/Keras's class_weight parameter of the fit() function work?

Question

I do semantic segmentation with TensorFlow 1.12 and Keras. I supply a vector of weights (size equal to the number of classes) to tf.keras.Model.fit() using its class_weight parameter. I was wondering how this works internally. I use a custom loss function(s) (dice loss and focal loss amongst others), and the weights cannot be premultiplied with the predictions or the one-hot ground truth before being fed to the loss function, since that wouldn't make any sense. My loss function outputs one scalar value, so it also cannot be multiplied with the function output. So where and how exactly as the class weights taken into account?

My custom loss function is:

def cross_entropy_loss(onehots_true, logits): # Inputs are [BATCH_SIZE, height, width, num_classes]
    logits, onehots_true = mask_pixels(onehots_true, logits) # Removes pixels for which no ground truth exists, and returns shape [num_gt_pixels, num_classes]
    return tf.losses.softmax_cross_entropy(onehots_true, logits)

have you checked my answer, please let me know if it is what you want — eugen, Sep 16 '19 at 05:52
Sorry for my late reaction. Your answer is very helpful! I still don't understand when the `class_sample_weight`s are begin applied, but I haven't had time to further explore the source code yet. — , Sep 16 '19 at 09:18

score 10 · Answer 1 · answered Sep 14 '19 at 14:29

As mentioned in the Keras Official Docs,

class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.

Basically, we provide class weights where we have a class imbalance. Meaning, the training samples are not uniformly distributed among all the classes. Some classes have fewer samples whereas some classes have higher samples.

We need the classifier to make more attention to the classes which are less in number. One way could be to increase the loss value for classes with low samples. A higher loss means higher optimization which results in efficient classification.

In terms of Keras, we pass a dict mapping class indices to their weights ( factors by which the loss value will be multiplied ). Let's take an example,

class_weights = { 0 : 1.2 , 1 : 0.9 }

Internally, the loss values for classes 0 and 1 will be multiplied by their corresponding weight values.

weighed_loss_class0 = loss0 * class_weights[0]
weighed_loss_class1 = loss1 * class_weights[1]

Now, the weighed_loss_class0 and weighed_loss_class1 will be used for backpropagation.

See this and this.

Thanks for your reply. I know what class weights are good for and why they are used, I was just wondering how they are actually applied. You mention that they are multiplied with the loss of each separate class, but _where_ does that happen? My loss function outputs one scalar value, so how can a weighted average over class-specific losses be taken? — , Sep 14 '19 at 16:39
It will be nice if you could share that code of the loss function. — Shubham Panchal, Sep 15 '19 at 00:38
Cool. What should the structure of dictionary you feed to class_weight be when you have multiple outputs? — grofte, Jun 03 '20 at 15:17

score 3 · Accepted Answer · answered Sep 15 '19 at 13:57

You can refer to the below code from keras source code in github:

    class_sample_weight = np.asarray(
        [class_weight[cls] for cls in y_classes if cls in class_weight])

    if len(class_sample_weight) != len(y_classes):
      # subtract the sets to pick all missing classes
      existing_classes = set(y_classes)
      existing_class_weight = set(class_weight.keys())
      raise ValueError(
          '`class_weight` must contain all classes in the data.'
          ' The classes %s exist in the data but not in '
          '`class_weight`.' % (existing_classes - existing_class_weight))

  if class_sample_weight is not None and sample_weight is not None:
    # Multiply weights if both are provided.
    return class_sample_weight * sample_weight

so as you can see, first class_weight is transformed into a numpy array class_sample_weight and then it is multiplied with the sample_weight.

source: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training_utils.py

How does TensorFlow/Keras's class_weight parameter of the fit() function work?

2 Answers2

Linked