0

I have dataset with about 10 columns with discrete data and I have troubles with transforming them to the to form where its possible to perform machine learning

I was able to transoform one column which contain only YES/NO values in this way:

le = LabelEncoder()
X['ABC'] = le.fit_transform(X['ABC'])

and it seems okay

However if i have something different than YES/NO, for example localisation with 10 different values i have only errors

from sklearn.feature_extraction import FeatureHasher
h = FeatureHasher(n_features=)
D = [{'dog': 1, 'cat':2, 'elephant':4},{'dog': 2, 'run': 5}]
f = h.transform(D)
f.toarray()

I tried using featurehasher bun im not sure if thats good idea, I've changed example code to get data from column but got an error with info: input can be only dict

i've also tried something like that:

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
X

but it also dont work

Could someone send me some tip or lonk for good tutorial? I found a lot but they deosnt seem to match my situation

Krystian
  • 15
  • 2

1 Answers1

0

You are almost there with ColumnTransformer and OneHotEncoder, refer to examples here (https://www.geeksforgeeks.org/prediction-using-columntransformer-onehotencoder-and-pipeline/) as well as their respective docs to get it working. Also when you say it doesn't work, please share what the error was.

Use OneHotEncoder for nominal cat features, and OrdinalEncoder for ordinal cat features.

There is a somewhat easier option of using pandas.get_dummies() (but typically is only used in notebooks and EDAs, rather than in a production environment) which is simpler syntactically.

The lines of code you used for LabelEncoder initially, you can also just apply OneHotEncoder the same way, without having to use ColumnTransformer. So that could work for you as well.