If you have a chance to collect more data, that could be the best solution.
(Assuming that you already attempted this step)
If precision is poor and recall is good which indicating that your model is good at predicting fraud class as fraud but the model is confusing for nonfraud class, most of the times it is predicting nonfraud class as fraud (if you set 0 for majority class 1 for minority class).
This means that you have to try on reducing the undersampling rate for the majority class.
Typically undersampling/oversampling will be done on train split only, this is the correct approach.
However,
Before undersampling, make sure your train split has class distribution as same as the main dataset. (Use stratified while splitting)
If you are using python sklearn
library for training your classifier set the parameter class_weight='balanced'
.
For example:
from sklearn.linear_model import LogisticRegression
Lr = LogisticRegression(class_weight='balanced')
- Try with different algorithms with different hyperparameters, if the model is underfitting then consider choosing XGboost.
If you do undersample before splitting then the test split distribution may not replicate the distribution of real-world data. Hence people typically avoid sampling before splitting.