1

I am doing an analysis of the effect of SMOTE on the performance of Random Forest and Logistic Regression. I have the following data from kaggle. The data consists of around 50000 observations and 58 variables. I trained four models on it:

  1. Random Forest
  2. Random Forest with SMOTE
  3. Logistic Regression
  4. Logistic Regression with SMOTE

I got the following results:

enter image description here

− = sqrt( × y)

Question: What causes the Logistic Regression to improve a lot with SMOTE and what causes the Random Forest to not improve so much?

My thought was that it may be because of the high dimensionality but I would expect the Random Forest to do better than the Logistic Regression.

RasM10
  • 25
  • 4

0 Answers0