I have a data set where there are columns that are of type object and others of type int or float. I understand that I need to convert the object columns to dummy variables but I also have some int and float columns that represent binary data (already 0 and 1). Will sklearn interpret these columns as categorical or not? I do not want these to be treated as continuous variables.
Asked
Active
Viewed 68 times
-1

CypherX
- 7,019
- 3
- 25
- 37

Altamash Rafiq
- 349
- 1
- 2
- 10
-
"sklearn" has functions and classes that process data. "Sklearn" does not interpret your dataframe as such – Mad Physicist Oct 25 '19 at 00:40
1 Answers
0
OneHotEncoder
does not process, which are the columns are categorical type. Hence, all the columns that are fed to OneHotEncoder
would be converted into dummy variables.
You can refer to the examples here.
If you already have binary variables and then it doesn't make sense to create two dummy variables for it.
You can use make_column_transformer
to specify the columns that you need one hot encoding.
Example:
>>> import pandas as pd
>>> X = pd.DataFrame([['Male', 0], ['Female', 1], ['Female', 0]], columns=['gender', 'groups'])
>>> from sklearn.compose import make_column_transformer
>>> ct = make_column_transformer((OneHotEncoder(),[0])) #, remainder='passthrough'
>>> ct.fit_transform(X)
array([[0., 1.],
[1., 0.],
[1., 0.]])

Venkatachalam
- 16,288
- 9
- 49
- 77