I have a data with 1025 inputs and 14 columns. First I set the label by putting them in separate tables.
x = dataset.drop('label', axis=1)
y = dataset['label']
The label values is only either 1 or 0. Then I split the data using:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.30)
I then make my Classifier:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
Then whenever I make my Decision tree, it ends up too big:
from sklearn import tree
tree.plot_tree(classifier.fit(X_train, y_train))
The result outputs 8 levels and it gets too big. I thought this was okay but after observing the confusion matrix and classification report:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
It results to:
[[155 3]
[ 3 147]]
precision recall f1-score support
0 0.98 0.98 0.98 158
1 0.98 0.98 0.98 150
accuracy 0.98 308
macro avg 0.98 0.98 0.98 308
weighted avg 0.98 0.98 0.98 308
The high accuracy makes me doubt my solution. What is wrong with my code and how can I tone down the decision tree and accuracy score?