I have an imbalanced dataset that have two classes (+1
,-1
). The positives are only 7% of the dataset.
I want to classify using Desicion Trees. I have tried downsampling the negatives to:
- The same size of the positives
- The double or triple the size of the positives.
For all of them I got almost the same precision, however the recall of positives was much better for the first sample (negatives same size as positives). But I feel I'm missing something here so what is bad about this sampling??