Accuracy score in Decision Tree

Question

Part 1

decision_tree.fit(X_train, y_train)
Y_val = decision_tree.predict(X_val)
acc_decision_tree_train = round(decision_tree.score(X_train, y_train) * 100, 2)
acc_decision_tree_train

Part 2

acc_decision_tree_val = round(decision_tree.score(X_val, y_val) * 100, 2)
print('accuracy:', acc_decision_tree_val)

Part 3

con_mat=confusion_matrix(y_val, Y_pred_val)
sns.heatmap(con_mat,annot=True,annot_kws= {"size":20},cmap="viridis")
plt.show()

Part 4

acc_decision_tree_test = round(decision_tree.score(X_test, y_test) * 100, 2)
print('accuracy:', acc_decision_tree_test)
Y_pred_test = decision_tree.predict(X_test)

There are 4 parts in the above code

Q1 -> Fit on train and and predict on Val, In this step the model learns by fitting on the training data x_train but we are not performing any prediction to obtain y_train so in this case how can we get the accuracy score of prediction for Train(model is learning, right?)

Q2 ->In part 2, as we already did "Y_val = decision_tree.predict(X_val)" above we can calculate the score of Validation, is this score same as the accuracy metric in the confusion matrix.

Q3-> Also in the part 4 I just asked for the accuracy score for the test data however I did not perform any 'predict' for the Test data but how was it able to give me the score with out even predicting.

Please let me know if something is not clear & Thanks in advance :)

score 0 · Answer 1 · answered May 31 '22 at 13:26

I adjusted the code snippet and I've got accuracies on iris dataset.

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix  
import matplotlib.pyplot as plt
import seaborn as sns

decision_tree = DecisionTreeClassifier(random_state=0)
iris = load_iris()

X_train, X_test, y_train, y_test= train_test_split(iris.data, iris.target, test_size= 0.25, random_state=0) 
X_train, X_val, y_train, y_val= train_test_split(X_train, y_train, test_size= 0.25, random_state=0) 

#score = cross_val_score(decision_tree, iris.data, iris.target, cv=10)

decision_tree.fit(X_train, y_train)
y_pred_val = decision_tree.predict(X_val)
acc_decision_tree_train = round(decision_tree.score(X_train, y_train) * 100, 2)
print("acc_decision_tree_train ", acc_decision_tree_train)

acc_decision_tree_val = round(decision_tree.score(X_val, y_val) * 100, 2)
print('accuracy:', acc_decision_tree_val)

con_mat=confusion_matrix(y_val, y_pred_val)
sns.heatmap(con_mat,annot=True,annot_kws= {"size":20},cmap="viridis")
plt.show()

acc_decision_tree_test = round(decision_tree.score(X_test, y_test) * 100, 2)
print('accuracy:', acc_decision_tree_test)
y_pred_test = decision_tree.predict(X_test)

Output:

 
acc_decision_tree_train 100.0
accuracy: 100.0
accuracy: 97.37

Accuracy score in Decision Tree

Part 1

Part 2

Part 3

Part 4

1 Answers1