I am using the SVM, and My dataset is imbalanced. I got the result in which it classified Class 0 as 99% and Class 1 as 1%. Is there any way to correctly classify the imbalances dataset using SVM.
2 Answers
There can be many ways you can work with imbalanced dataset. I have most commonly used a couple of these:
Penalize for wrong output: If class
A
has much less samples than classB
, then you can increase the penalty incurred for incorrect classification of classA
.Use the SMOTE module. It basically takes the convex combination of two points in a given class and assigns it the same label as the two chosen points.
Other possible options can include looking at different evaluation metrics and validation strategies like Stratified K Fold.

- 1,145
- 9
- 30
There are several ways to adapt an unbalanced dataset to use it for regression/classification. Here I'm going to describe the oversampling and undersampling methods.
In oversampling, you duplicate the data for the minority class, even when you have rows in your data that are exactly the same. In undersampling you pick all the data that has class 1 and pick the same number of samples that have label 0 (this is only a good option if you have a high number of samples).
You could also use a mix of the two. Something like:
def obtain_equal_idx(idx_0, idx_1, n_samples, ratio_unbalance):
idx_1_repeated = np.repeat(idx_1, (n_samples // len(idx_1)) + 1)
idx_0s = np.random.choice(idx_0, ratio_unbalance * (n_samples // 2), replace=False)
idx_delay = np.random.choice(idx_1_repeated, n_samples // 2, replace=False)
return np.concatenate([idx_0s, idx_delay])
With idx_0
being the indexes of your whole dataset labeled as 0, idx_1
being the same for the data labeled as 1, n_samples
is the number of samples you want to get, and ratio_unbalance
is a number (usually 2 or 3) that allow the data you get to be a bit unbalanced so that your model knows that the data is not completely balanced.

- 178
- 1
- 10