-1

If dataset have multiple categorical values then do we need to perform OneHotEncoding on all of the categorical data, and then how to remove problem of dummy variable.

  • It's not clear what you're asking for. Yes, you can one-hot encode all of your categorical variables, what is the _problem of dummy variable_? – G. Anderson Jun 12 '19 at 18:12
  • please see stackoverflow's guideline for asking a proper question. https://stackoverflow.com/help/how-to-ask – Julian Silvestri Jun 12 '19 at 19:04

1 Answers1

0

It it not clear from your question what you are trying to achieve. Typically in machine learning you could do one-hot encoding or label encoding, but you need to encode categorical values before passing it to modeling.

IMO, Label encoding is simpler and can help in classical regressions/classifications and Onehot encoding can be used when you are planning to apply deeplearning. More discussion at : https://datascience.stackexchange.com/questions/9443/when-to-use-one-hot-encoding-vs-labelencoder-vs-dictvectorizor

Having said that you could do label encoding like this:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
numeric_cols = model_data_df._get_numeric_data().columns
for col in list(set(model_data_df.columns) - set(numeric_cols)):
        model_data_df[col] = le.fit_transform(model_data_df[col].astype(str))
rlpatrao
  • 565
  • 1
  • 8
  • 15