What to do if we have multiple categorical columns multi class?

Question

If dataset have multiple categorical values then do we need to perform OneHotEncoding on all of the categorical data, and then how to remove problem of dummy variable.

It's not clear what you're asking for. Yes, you can one-hot encode all of your categorical variables, what is the _problem of dummy variable_? — G. Anderson, Jun 12 '19 at 18:12
please see stackoverflow's guideline for asking a proper question. https://stackoverflow.com/help/how-to-ask — Julian Silvestri, Jun 12 '19 at 19:04

score 0 · Answer 1 · answered Jun 12 '19 at 18:15

It it not clear from your question what you are trying to achieve. Typically in machine learning you could do one-hot encoding or label encoding, but you need to encode categorical values before passing it to modeling.

IMO, Label encoding is simpler and can help in classical regressions/classifications and Onehot encoding can be used when you are planning to apply deeplearning. More discussion at : https://datascience.stackexchange.com/questions/9443/when-to-use-one-hot-encoding-vs-labelencoder-vs-dictvectorizor

Having said that you could do label encoding like this:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
numeric_cols = model_data_df._get_numeric_data().columns
for col in list(set(model_data_df.columns) - set(numeric_cols)):
        model_data_df[col] = le.fit_transform(model_data_df[col].astype(str))

What to do if we have multiple categorical columns multi class?

1 Answers1