I currently have an imbalanced dataset of over 800,000 datapoints. The imbalance is severe as there is only 3719 datapoints for one of the two classes. Upon undersampling the data using NearMiss algorithm in Python and applying a Random Forest classifier, I am able to achieve the following results:
- Accuracy: 81.4%
- Precision: 82.6%
- Recall: 79.4%
- Specificity: 83.4%
However, when re-testing this same model on the full dataset again, the confusion matrix results show a large bias towards the minority class for some reason, showing a large number of false positives. Is this the correct way of testing the model after undersampling?