Adding Class Weights for imbalanced dataset in Convolutional Neural Network

Question

I have a dataset of images that has the following distribution:

Class 0: 73,5%
Class 1: 7%
Class 2: 15%
Class 3: 2,5%
Class 4: 2%

I think I need to add Class Weights to make up for the low amount of images in class 1, 2, 3 and 4.

I have tried calculating the class weights by dividing class 0 with class 1, class 0 with class 2 and so forth.

I'm assuming that class 0 corresponds to 1, as it doesnt need to be scaled? Not sure if that is correct though.

class_weights = np.array([1, 10.5, 4.9, 29.4, 36.75])

and added them to my fit function:

model.fit(x_train, y_train, batch_size=batch_size, class_weight=class_weights, epochs=epochs, validation_data=(x_test, y_test))

I'm unsure if I have calculated the weights correctly, and if this is even how it is supposed to be done?

Hopefully anyone can help clarifying it.

score 8 · Answer 1 · answered Dec 20 '18 at 00:15

First of all make sure to pass a dictionary since the class_weights parameter takes a dictionary.

Second, the point of weighting the classes is as follows. Lets say that you have a binary classification problem where class_1 has 1000 instances and class_2 100 instances. Since you wanna make up for the imbalanced data you can set the weights as:

class_weights={"class_1": 1, "class_2": 10}

In other words, this would mean that if the model makes a mistake where the true label is class_2 it is going to be penalized 10 times more than if it makes a mistake on a sample where the true class is class_1. You want to have something like this because given the class distribution in the data, the model will have an inherent tendency of overfitting on the class_1 since it is overpopulated by default. By setting the class weights you are imposing an implicit constraint on the model that it is equally bad to make a wrong prediction on 10 instances of the class_1 and 1 wrong prediction on an instance of the class_2.

With that said, you can set the class_weights anyhow you want meaning that there is no right or wrong way to do it. The way you set the weights seems reasonable to me.

Thank you for your response. My classes are 0, 1, 2, 3 and 4 in my .csv file. I tried making a dict like you suggested: class_weights={"0": 1, "1": 10.5, "2": 4.8, "3": 29.5, "4": 36.4} I'm getting the following error: ValueError: `class_weight` must contain all classes in the data. The classes {0, 1, 2, 3, 4} exist in the data but not in `class_weight`. Have you any idea of why this is happening, is it wrong naming calling them 0, 1, 2, 3, 4? — jeez, Dec 20 '18 at 00:45
My csv file is structured like this: image, level as the header and 10_left, 0, 15_left, 1, 15_right, 2, 16_left, 4 etc. Am I wrong to assume the class names should be 0, 1, 2, 3, 4 as that the classes i have? — jeez, Dec 20 '18 at 00:55
@jeez very late, but your dictionary should not have strings as the keys. So it should be `class_weights = {0: 1, 1: 10.5, 2: 4.8, 3: 29.5, 4: 36.4}` — a.powell, Apr 21 '20 at 19:34

score 2 · Answer 2 · answered Mar 17 '20 at 06:00

2

Please visit this answer for a proper solution https://datascience.stackexchange.com/a/18722

I understand that you are trying to set class weights, but also consider image augmentation to generate more images for the underrepresented classes.

answered Mar 17 '20 at 06:00

self.novice_coder

83
8

score 0 · Answer 3 · answered Dec 20 '18 at 01:38

0

I solved the problem, thank you so much gorjan.

class_weight = {0: 1.0,
            1: 10.5,
            2: 4.8,
            3: 29.5,
            4: 36.4}

Instead of typing for example "0" or "1" around classname, it was without the "" that did the trick :-) and to use the dict as you suggested instead of the np array.

answered Dec 20 '18 at 01:38

jeez

59
2
4

Well yes that makes sense since the classes you have are not strings but integers instead. – gorjan Dec 20 '18 at 08:42

Adding Class Weights for imbalanced dataset in Convolutional Neural Network

3 Answers3