I have dataset with about 10 columns with discrete data and I have troubles with transforming them to the to form where its possible to perform machine learning
I was able to transoform one column which contain only YES/NO values in this way:
le = LabelEncoder()
X['ABC'] = le.fit_transform(X['ABC'])
and it seems okay
However if i have something different than YES/NO, for example localisation with 10 different values i have only errors
from sklearn.feature_extraction import FeatureHasher
h = FeatureHasher(n_features=)
D = [{'dog': 1, 'cat':2, 'elephant':4},{'dog': 2, 'run': 5}]
f = h.transform(D)
f.toarray()
I tried using featurehasher bun im not sure if thats good idea, I've changed example code to get data from column but got an error with info: input can be only dict
i've also tried something like that:
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
X
but it also dont work
Could someone send me some tip or lonk for good tutorial? I found a lot but they deosnt seem to match my situation