I am attempting to train a binary positive/negative classifier using SVM inside of Encog. In this specific case, the data set is highly unbalanced, with negative examples outnumbering positive examples roughly 30:1.
In the training of the model, I am deliberately undersampling the negative cases to roughly balance the positive/negative cases given to the model, an approach that has worked well for me for other problems. In this case, however, the end model ends up with an unacceptably high false positive rate, with the number of false positives outweighing the number of true positives when tested on an unbalanced test set.
Any suggestions for how to train to reduce the false positive rate? Training with unbalanced data (or with a closer-to-observed balance) will reduce the number of overall positive predictions, but it doesn't seem to increase the true positive to false positive ratio.