I have to evaluate logistic regression model. This model is aimed to detect frouds, so in real life the algorithm will face highly imbalanced data.
Some people say that I need to balance train set only, while test set should remain similar to real life data. On the other hand, many people say that model must be trained and tested on balanced samples.
I tried to test my model for both (balanced, unbalanced) sets and get the same ROC AUC (0.73), but different precision-recall curve AUC - 0.4 (for unbalanced) and 0.74 for (balanced).
What shoud I choose?
And what metrics should I use to evaluate my model perfomance?