I'm new to the Python ML using scikit. I was working on a solution to create a model with three columns Pets, Owner and location.
import pandas
import joblib
from sklearn.tree import DecisionTreeClassifier
from collections import defaultdict
from sklearn import preprocessing
df = pandas.DataFrame({
'pets': ['cat', 'dog', 'cat', 'monkey', 'dog', 'dog'],
'owner': ['Champ', 'Ron', 'Brick', 'Champ', 'Veronica', 'Ron'],
'location': ['San_Diego', 'New_York', 'New_York', 'San_Diego', 'San_Diego',
'New_York']
})
Now, with the label encoder I'm encoding the entire Data Frame.
le = preprocessing.LabelEncoder()
df_encoded = df.apply(le.fit_transform)
df_array=df_encoded.values
Now, I'm splitting the encoded array into Input set (Pets and Owner) and an Output set (location)
IpSet = df_array[:,0:2]
Opset = df_array[:,2:3]
Then, I create a new model of decision tree classifier and am fitting the input and output set.
model = DecisionTreeClassifier()
model.fit(IpSet,Opset)
Now, I'm trying to predict the Location using the model for a new Dataframe. I'm using the same Label encoder as used earlier.
df_Predict = pandas.DataFrame({
'pets': ['cat'],
'owner': ['Champ']})
df_encoded_Predict = df_Predict.apply(le.fit_transform)
predictions_train = model.predict(df_encoded_Predict)
print(le.inverse_transform(predictions_train)[:1])
With this, I'm expecting to see the value 'San Diego'. Not sure, why I'm getting 'Champ' as an output.
Could someone help me through this?