0

I am working on a multi class classification use case and the data is highly imbalanced. By highly imbalanced data I mean that there is a huge difference between class with maximum frequency and the class with minimum frequency. So if I go ahead using SMOTE oversampling then the data size increases tremendously (data size goes from 280k rows to more than 25 billion rows because the imbalance is too high) and it becomes practically impossible to fit a ML model to such a huge dataset. Similarly I can't use undersampling as that would lead to loss of information.

So I thought of using compute_class_weight from sklearn while creating a ML model.

Code:

from sklearn.utils.class_weight import compute_class_weight

class_weight = compute_class_weight(class_weight='balanced',
                                    classes=np.unique(train_df['Label_id']),
                                    y=train_df['Label_id'])

dict_weights = dict(zip(np.unique(train_df['Label_id']), class_weight))

svc_model = LinearSVC(class_weight=dict_weights)

I did predictions on test data and noted the result of metrics like accuracy, f1_score, recall etc. I tried to replicate the same but by not passing class_weight, like this:

svc_model = LinearSVC()

But the results I obtained were strange. The metrics after passing class_weight were a bit poor than the metrics without class_weight.

I was hoping for the exact opposite as I am using class_weight to make the model better and hence the metrics.

The difference between metrics for both the models was minimal but f1_score was less for model with class_weight as compared to model without class_weight.

I also tried the below snippet:

svc_model = LinearSVC(class_weight='balanced')

but still the f1_score was less as compared to model without class_weight.

Below are the metrics I obtained:

LinearSVC w/o class_weight

Accuracy: 89.02, F1 score: 88.92, Precision: 89.17, Recall: 89.02, Misclassification error: 10.98

LinearSVC with class_weight=’balanced’

Accuracy: 87.98, F1 score: 87.89, Precision: 88.3, Recall: 87.98, Misclassification error: 12.02

LinearSVC with class_weight=dict_weights

Accuracy: 87.97, F1 score: 87.87, Precision: 88.34, Recall: 87.97, Misclassification error: 12.03

I assumed that using class_weight would improve the metrics but instead its deteriorating the metrics. Why is this happening and what should I do? Will it be okay if I don't handle imbalance data?

James Z
  • 12,209
  • 10
  • 24
  • 44
learnToCode
  • 341
  • 4
  • 14

2 Answers2

0

It's not always guaranteed that if you use class_weight the performance will improve always. There is always some uncertainty involved when we're working with stochastic systems.

You can try with class_weight = 'auto'. Here's a discussion: https://github.com/scikit-learn/scikit-learn/issues/4324

Finally, you seem to use default hyperparameter of linear SVM, meaning C=1 and; I would suggest experimenting with the hyparameters, even do a grid-search if possible to test, if still the class_weight decreases performance, try data-normalization.

Zabir Al Nazi
  • 10,298
  • 4
  • 33
  • 60
  • The data I am working on is textual data. What do you mean by data normalization. Could you please elaborate? – learnToCode May 11 '20 at 14:10
  • For text data, svc seems a really non-ideal option. Have you tried mlp, or if you are interested in DL then lstm/gru. If it's text data you can try removing stop words. – Zabir Al Nazi May 11 '20 at 14:13
  • Agreed, but it isn't actual text data with paragraphs or long sentences. The feature contains at max 4 words, so it cannot be considered as an actual textual data where we would use NLP or any of the DL models. I hope you get my point. This usecase can be easily solved using `tfidf` along with ML model. As you can also see the `f1_score` which we are getting isn't that bad. – learnToCode May 11 '20 at 14:36
0

How I see the Problem

My understanding of your problem is that your class weight approach IS actually improving your model, but you don't see it (probably). Here is why:

Assume you have 10 POS and 1k NEG samples, and you have two models: M-1 predicts all of the NEG samples correctly (false negative rate = 0) but predicts only 2 out of 10 POS samples correctly. M-2 predicts 700 of the NEG and 8 POS samples correctly. From the anomaly detection point of view, second model might be preferred, while the first model (which clearly has trapped in the imbalance issue) has higher f1 score.

class weights will attempt to solve your imbalance issue, shifting your model from M-1 to M-2. Thus your f1 score might decrease slightly. but you might have a model of better quality.

How You can Validate my opinion

You can check my point by looking at the confusion matrix to see if the f1 score has been decreased due to more misclassification of your major class, and if your minor class is now having more true positives. Plus, you can test other metrics specifically for imbalance classes. I know of Cohen's Kappa Maybe you see that class weights actually increase the Kappa score.

And one more thing: do some bootstrapping or cross validation, the change in f1 score might be due to data variability and mean nothing

Alireza
  • 656
  • 1
  • 6
  • 20
  • I did try `StratifiedKfold` CV but got similar results. Also I have 600+ classes so it is practically impossible to read a `confusion_matrix` in my case. Additionally I am using `f1_score` with`average='weighted'` to penalize the classes with high freq and get a `f1_score` by taking class weights into account. So your example of POS and NEG won't hold true in my case as `f1_score` with `average='weighted'` will penalize NEG class – learnToCode May 11 '20 at 18:09