I want to get an estimate on how well the classifiers would work on an imbalance dataset of mine. When I try to fit KNN classifier from sklearn it learns nothing for the minority class. So what I did was I fit the classifier with k = R (where r is the imbalance ratio 1: R) and I predict probabilities for each test point and assign a point to minority class if the probability output of the classifier for the minority class is great than R (where r is the imbalance ratio 1: R). I do this to get an estimate of how the classifier performs(F1-score). I don't need the classifier in production. Is what I'm doing right?
Asked
Active
Viewed 1,593 times
2
-
Welcome to SO. The way around this is frequency based resampling. Possible duplicate of [this question](https://stackoverflow.com/questions/37876280/knn-with-class-weights-in-sklearn). – Sıddık Açıl Jun 24 '19 at 07:07
-
I also worked with imbalance data once, that time I used `SMOTE` and generated minority class examples synthetically so that the ratio of majority and minority class data becomes `1:1`. you can check SMOTE here https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.over_sampling.SMOTE.html – Vikas Gautam Jun 24 '19 at 07:10
-
Is there any way without re-sampling ?? – Nitin Shravan Jun 24 '19 at 07:12
-
I don't know honestly, but as an alternative, you can choose randomly majority class data from dataset such that ratio is always `1:1` with minority class. – Vikas Gautam Jun 24 '19 at 07:16
-
If you are not constrained on the classier to use, you could try a classifier with parameters like decision trees or random forest where you get to specify the class weights by yourself. Doing so your model will start picking up the minority classes as well. Please refer to https://stackoverflow.com/questions/37522191/sklearn-how-to-balance-classification-using-decisiontreeclassifier, for the implementation details. – Parthasarathy Subburaj Jun 24 '19 at 07:26
1 Answers
0
Since you have mentioned in the comments that you dont want to use resampling, the one way out is batching. Create multiple dataset from your majority class so that they will be 1:1 ratio with minority class. Train multiple models with each model getting one part of the majority set and all of the minority. Make a prediction with all the models and take a vote from them and decide your final outcome.
But I would suggest using SMOTE over this method.

secretive
- 2,032
- 7
- 16