2

I am working on building a predictive model for a regression problem. And I am suffering from a phenomenon where the model cannot learn well due to the large number of '0's in the target variable.

So, I have arrived at a SMOTER(SMOTE for Regression) that corresponds to an imbalance in the objective variable.

The code is described below.

!pip install resreg
!pip install smogn
import resreg
import smogn
import seaborn as sns

X_smoter, y_smoter = resreg.smoter(
X_train, y_train,
relevance=relevance,
k=5,
over="balance",
under=0.33,
random_state=0,
nominal=None)


sns.kdeplot(data=y_train,label="org")
sns.kdeplot(data=y_smoter,label='smoter')
plt.title('Distribution plots of target variables')
plt.legend()
plt.show()

enter image description here

As a result, we succeeded in changing the distribution of the data so that it is not biased toward '0'.

However, the categorical variable gender (e.g. 0 for males and 1 for females) is mixed with a continuous variable such as '0.3'.

Is there any way to extend the data by SMOTER and address the issue of categorical variables becoming continuous variables?

Dai
  • 91
  • 8

0 Answers0