If dataset have multiple categorical values then do we need to perform OneHotEncoding on all of the categorical data, and then how to remove problem of dummy variable.
Asked
Active
Viewed 199 times
-1
-
It's not clear what you're asking for. Yes, you can one-hot encode all of your categorical variables, what is the _problem of dummy variable_? – G. Anderson Jun 12 '19 at 18:12
-
please see stackoverflow's guideline for asking a proper question. https://stackoverflow.com/help/how-to-ask – Julian Silvestri Jun 12 '19 at 19:04
1 Answers
0
It it not clear from your question what you are trying to achieve. Typically in machine learning you could do one-hot encoding or label encoding, but you need to encode categorical values before passing it to modeling.
IMO, Label encoding is simpler and can help in classical regressions/classifications and Onehot encoding can be used when you are planning to apply deeplearning. More discussion at : https://datascience.stackexchange.com/questions/9443/when-to-use-one-hot-encoding-vs-labelencoder-vs-dictvectorizor
Having said that you could do label encoding like this:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
numeric_cols = model_data_df._get_numeric_data().columns
for col in list(set(model_data_df.columns) - set(numeric_cols)):
model_data_df[col] = le.fit_transform(model_data_df[col].astype(str))

rlpatrao
- 565
- 1
- 8
- 15