I am working on multi-class classification problem having five classes in the target column. I have generated features for categorical variables using expanding mean encoding(Target encoding). The method is based on encoding categorical variable values with mean of target variable per value.
This also results in some NaN values like in 'Transaction-Type_mean_target' column.
What is the best way to fill these NaN values? Should I fill these with the column mean.
How do I generate mean encoding for my test data as the target/Dependent variable 'Complaint-Status' is not present?
Input data :
Generating mean encoding :
def add_feat_mean_encoding(col_list):
"""
Expanding mean encoding
"""
for i in col_list:
cumsum = train.groupby(i)['Complaint-Status'].cumsum() - train['Complaint-Status']
cumcnt = train.groupby(i).cumcount()
train[i+'_mean_target'] = cumsum/cumcnt
cat_var = ['Transaction-Type','Complaint-reason','Company-response','Consumer-disputes']
add_feat_mean_encoding(cat_var)