Sklearn Label Encoder - Not getting desired output based on prediction and inverse transform

Question

I'm new to the Python ML using scikit. I was working on a solution to create a model with three columns Pets, Owner and location.

import pandas
import joblib
from sklearn.tree import DecisionTreeClassifier
from collections import defaultdict
from sklearn import preprocessing 

df = pandas.DataFrame({
    'pets': ['cat', 'dog', 'cat', 'monkey', 'dog', 'dog'], 
    'owner': ['Champ', 'Ron', 'Brick', 'Champ', 'Veronica', 'Ron'], 
    'location': ['San_Diego', 'New_York', 'New_York', 'San_Diego', 'San_Diego', 
                 'New_York']
})

Now, with the label encoder I'm encoding the entire Data Frame.

le = preprocessing.LabelEncoder()
df_encoded = df.apply(le.fit_transform)
df_array=df_encoded.values

Now, I'm splitting the encoded array into Input set (Pets and Owner) and an Output set (location)

IpSet = df_array[:,0:2]
Opset = df_array[:,2:3]

Then, I create a new model of decision tree classifier and am fitting the input and output set.

model = DecisionTreeClassifier()
model.fit(IpSet,Opset)

Now, I'm trying to predict the Location using the model for a new Dataframe. I'm using the same Label encoder as used earlier.

df_Predict = pandas.DataFrame({
    'pets': ['cat'], 
    'owner': ['Champ']})
df_encoded_Predict = df_Predict.apply(le.fit_transform)
predictions_train = model.predict(df_encoded_Predict)
print(le.inverse_transform(predictions_train)[:1])

With this, I'm expecting to see the value 'San Diego'. Not sure, why I'm getting 'Champ' as an output.

Could someone help me through this?

Don't `fit` transformers on your test data, you only call `fit` or `fit_transform` on the input. Then at the time of prediction, you call `trasform` with the fitted trasformer — G. Anderson, Jan 18 '22 at 17:48
Also, you should be using `le.fit_transform(df)` not `df.apply(...)` — G. Anderson, Jan 18 '22 at 17:49
@G.Anderson, I don't think I'm following you. 1) Could you give the logic for fit_transform only for input. Should I convert the df into input and opset even before label encoding? 2) When I do le.fit_transform(df), it works only on a 1d array. I'm trying to label encode the entire input set. — ItsMeGokul, Jan 18 '22 at 18:00
I think that you should refer to the API of LabelEncoder, so that you can know how to apply the fit labels on the test data. — Qiyu Zhong, Jan 18 '22 at 22:35

score 0 · Answer 1 · answered Jan 19 '22 at 17:13

The logic you following is not correct.

    df_encoded = df.apply(le.fit_transform)

Here the same encoder ( le ) fitted for every column and end of this line execution le has only the location information.

When you need to use already fitted encoder use the .transform() method instead of following.

       df_encoded_Predict = df_Predict.apply(le.fit_transform)

Sklearn Label Encoder - Not getting desired output based on prediction and inverse transform

1 Answers1