3

I have a dataset of images that has the following distribution:

  • Class 0: 73,5%
  • Class 1: 7%
  • Class 2: 15%
  • Class 3: 2,5%
  • Class 4: 2%

I think I need to add Class Weights to make up for the low amount of images in class 1, 2, 3 and 4.

I have tried calculating the class weights by dividing class 0 with class 1, class 0 with class 2 and so forth.

I'm assuming that class 0 corresponds to 1, as it doesnt need to be scaled? Not sure if that is correct though.

class_weights = np.array([1, 10.5, 4.9, 29.4, 36.75]) 

and added them to my fit function:

model.fit(x_train, y_train, batch_size=batch_size, class_weight=class_weights, epochs=epochs, validation_data=(x_test, y_test))

I'm unsure if I have calculated the weights correctly, and if this is even how it is supposed to be done?

Hopefully anyone can help clarifying it.

jeez
  • 59
  • 2
  • 4

3 Answers3

8

First of all make sure to pass a dictionary since the class_weights parameter takes a dictionary.

Second, the point of weighting the classes is as follows. Lets say that you have a binary classification problem where class_1 has 1000 instances and class_2 100 instances. Since you wanna make up for the imbalanced data you can set the weights as:

class_weights={"class_1": 1, "class_2": 10}

In other words, this would mean that if the model makes a mistake where the true label is class_2 it is going to be penalized 10 times more than if it makes a mistake on a sample where the true class is class_1. You want to have something like this because given the class distribution in the data, the model will have an inherent tendency of overfitting on the class_1 since it is overpopulated by default. By setting the class weights you are imposing an implicit constraint on the model that it is equally bad to make a wrong prediction on 10 instances of the class_1 and 1 wrong prediction on an instance of the class_2.

With that said, you can set the class_weights anyhow you want meaning that there is no right or wrong way to do it. The way you set the weights seems reasonable to me.

gorjan
  • 5,405
  • 2
  • 20
  • 40
  • Thank you for your response. My classes are 0, 1, 2, 3 and 4 in my .csv file. I tried making a dict like you suggested: class_weights={"0": 1, "1": 10.5, "2": 4.8, "3": 29.5, "4": 36.4} I'm getting the following error: ValueError: `class_weight` must contain all classes in the data. The classes {0, 1, 2, 3, 4} exist in the data but not in `class_weight`. Have you any idea of why this is happening, is it wrong naming calling them 0, 1, 2, 3, 4? – jeez Dec 20 '18 at 00:45
  • My csv file is structured like this: image, level as the header and 10_left, 0, 15_left, 1, 15_right, 2, 16_left, 4 etc. Am I wrong to assume the class names should be 0, 1, 2, 3, 4 as that the classes i have? – jeez Dec 20 '18 at 00:55
  • 2
    @jeez very late, but your dictionary should not have strings as the keys. So it should be `class_weights = {0: 1, 1: 10.5, 2: 4.8, 3: 29.5, 4: 36.4}` – a.powell Apr 21 '20 at 19:34
2

Please visit this answer for a proper solution https://datascience.stackexchange.com/a/18722

I understand that you are trying to set class weights, but also consider image augmentation to generate more images for the underrepresented classes.

0

I solved the problem, thank you so much gorjan.

class_weight = {0: 1.0,
            1: 10.5,
            2: 4.8,
            3: 29.5,
            4: 36.4}

Instead of typing for example "0" or "1" around classname, it was without the "" that did the trick :-) and to use the dict as you suggested instead of the np array.

jeez
  • 59
  • 2
  • 4