sampling ratio for imbalanced dataset

Question

I have an imbalanced dataset that have two classes (+1,-1). The positives are only 7% of the dataset.

I want to classify using Desicion Trees. I have tried downsampling the negatives to:

The same size of the positives
The double or triple the size of the positives.

For all of them I got almost the same precision, however the recall of positives was much better for the first sample (negatives same size as positives). But I feel I'm missing something here so what is bad about this sampling??

score 0 · Answer 1 · answered Dec 16 '17 at 14:09

It is fairly common to downsample a dominant class.

But you need to make sure to solve your actual problem.

If you downsample your classes to a 1:1 ratio that may make certain evaluation appear good, but does this still reflect reality? You classifier is trained to predict positive in 50% of cases, but only 3% are positive. If "false positives" cost you a lot of money, this can be a problem.

sampling ratio for imbalanced dataset

1 Answers1