Show the complete stack trace and some reproducible code and data so that we can check this. This seems like an easy to solve problem and can be done if you provide how you intend to solve it.
Other than that, there are multiple issues here:
1) OneHotEncoder
cannot be used directly on strings directly. First you need to convert your string features to integers, (maybe using LabelEncoder
)
2) One-hot encoding will transform your single column into multiple columns (depending on unique values in them), so you cannot assign it directly to a single column of your dataframe.
3) If you are able to successfully transform using OneHotEncoder
, even then the data returned is a sparse matrix, which again does not go well with pandas dataframe.
4) You are assigning the same data to the same dataframe twice. Once inside method you are doing this:
repair[field]=oe.transform(repair[field])
And then you call the method like this:
repair['SALES_ORG_ID']=OneHotEncoder(repair,'SALES_ORG_ID')
This is un-necessary.
5) You are first fitting (or trying to fit) all the data in the field
. So the oe.classes_ will contain all the unique categories. So after that, doing
repair[field] = repair[field].map(lambda s: 'Other' if s not in oe.classes_ else s)
dont make any sense. Can you show how you are doing this successfully for labelencoder as you said in the question? Now even if you somehow sucessfully do this, the next line:
repair[field]=oe.transform(repair[field])
will throw error because, 'Other' is a string which the OneHotEncoder dont handle. You need to add the extra category ('Other'
in this case before fitting the data.
6) I would suggest you to save the transformers using joblib or pickle instead of numpy.
Note: As mentioned in changelog here, from next version (0.20.0), OneHotEncoder will be able to handle strings in passed data:
String or pandas Categorical columns can now be encoded with OneHotEncoder
or OrdinalEncoder
.