I've been trying to make a prediction that consists of a DataFrame from the model I've made using the Decision Tree algorithm.
I have got the score for my model, which is 0.96. Then, I have tried to use the model to make a prediction from DataFrame people who stay but got an error. The goal is to predict people who will leave the company in the future based on DataFrame who stay.
How to achieve that goal?
So what I did is:
- Read the DF from my github and splitting them to people who left and not left
df = pd.read_csv('https://raw.githubusercontent.com/bhaskoro-muthohar/DataScienceLearning/master/HR_comma_sep.csv')
leftdf = df[df['left']==1]
notleftdf =df[df['left']==0]
- Preparing the data for Model Generation
df.salary = df.salary.map({'low':0,'medium':1,'high':2})
df.salary
X = df.drop(['left','sales'],axis=1)
y = df['left']
- splitting the train and test sets
import numpy as np
from sklearn.model_selection import train_test_split
#splitting the train and test sets
X_train, X_test, y_train, y_test= train_test_split(X,y,random_state=0, stratify=y)
- Train it
from sklearn import tree
clftree = tree.DecisionTreeClassifier(max_depth=3)
clftree.fit(X_train,y_train)
- Evaluating the model
y_pred = clftree.predict(X_test)
print("Test set prediction:\n {}".format(y_pred))
print("Test set score: {:.2f}".format(clftree.score(X_test, y_test)))
The result is
Test set score: 0.96
- Then I'm trying to Making a prediction using the DataFrame from people who not yet left the company
X_new = notleftdf.drop(['left','sales'],axis=1)
#Map salary to 0,1,2
X_new.salary = X_new.salary.map({'low':0,'medium':1,'high':2})
X_new.salary
prediction_will_left = clftree.predict(X_new)
print("Prediction: {}".format(prediction_will_left))
print("Predicted target name: {}".format(
notleftdf['left'][prediction_will_left]
))
The error I got is:
KeyError: "None of [Int64Index([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n ...\n 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],\n dtype='int64', length=11428)] are in the [index]"
How to solve it?
PS: For full script link is here