1

I have a multi label data (some classes have 2 and some 10 labels)and my model is overfitting for balanced and None values.What are the best values to set for the class_weight parameter.

from sklearn.svm import LinearSVC
svm = LinearSVC(C=0.01,max_iter=100,dual=False,class_weight=None,verbose=1)
seralouk
  • 30,938
  • 9
  • 118
  • 133

1 Answers1

0

The class_weight parameters controls actually the C parameters in the following way:

class_weight : {dict, ‘balanced’}, optional

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

Try to play with the class_weight while keeping C the same e.g. C=0.1


EDIT

Here is a beautiful way to create the class_weight for your 171 classes.

# store the weights for each class in a list
weights_per_class = [2,3,4,5,6]

#Let's assume that you have a `y` like this:
y = [121, 122, 123, 124, 125]

You need:

# create the `class_weight` dictionary
class_weight = {val:weights_per_class[index] for index,val in enumerate (y)}

print(class_weight)
#{121: 2, 122: 3, 123: 4, 124: 5, 125: 6}

# Use it as argument
svm = LinearSVC(class_weight=class_weight)
seralouk
  • 30,938
  • 9
  • 118
  • 133
  • I have 171 classes in my dataset.How do I set for all of them?could you give me an example – Sample Test Nov 21 '19 at 14:44
  • Your `y` that has the class labels starts from `0` or `1` ? In other words, the labels of the first class is 0 or 1 ? I will need to update my answer based on your answer – seralouk Nov 21 '19 at 14:58
  • well I have labels like 121,122,123 and so on.Not continuous but my labels are like that.In total,they are 171. Is adding more weight to a less weighted label same as oversampling?If not,what is the difference? – Sample Test Nov 21 '19 at 16:27
  • see my updated answer. consider accepting and upvoting – seralouk Nov 21 '19 at 16:32
  • sure.Could you also consider this question : https://stackoverflow.com/questions/58991545/suming-up-two-rows-in-data-frame-based-on-a-condition-in-python – Sample Test Nov 22 '19 at 09:51