0

I have applied Logistic Regression on the data containing both binary and numerical predictors with a binary target. The confusion matrix of the results has True Negatives(65%) followed by False Positive(>20%) higher than True Positive(8%). I need help to understand why this might be happening and the next steps to follow to improve the True positives.

For additional information, I did outlier elimination, missing value imputation, applied MinMaxScaler and Power Transformer as part of the data preprocessing. Also my data in imbalanced (90% - 0s, 10% - 1s) and I applied SMOTE to up sample before applying Logistic Regression.

LBala
  • 1
  • Are you overfitting? Maybe you can try to include more features if you have any at hand. Perhaps logreg doesn't work but another algorithm might work better (tree based or neural networks given enough data). – toni057 Jan 30 '22 at 18:31
  • Thank you @toni057 for the quick response. I have tried tree based and Random Forest but they still give me a similar results. I have 17K observations with 90/10 for the target and using 105 features. I used SMOTE, SMOTEENN and tried changing the probability threshold for the predictions but still False positives is higher than True positives and Precision is not greater than > 0.20 in any of the instances. The Highest AUC score is 0.70 – LBala Feb 01 '22 at 14:37
  • If anyone faced this issue and come up with a solution that worked for you, I really appreciate your inputs. Thank you. – LBala Feb 03 '22 at 18:09
  • It happens that when you have imbalanced data that the confusion matrix will show more false positives than true positives, and the more imbalance you have the worse it gets. If you look at rates (ie the ROC curve) you should see TPR be higher that FPR. Try looking at some other charts, like cumulative gains to get more intuition. Though given AUC (AUROC I presume) 0.7 it seems like you don't have a very predictable problem at hand, or just need better features. – toni057 Feb 04 '22 at 19:14
  • Please provide enough code so others can better understand or reproduce the problem. – Community Feb 08 '22 at 19:50

0 Answers0