2

Im trying to draw the tree using the following code:

import sklearn.tree
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
model1 = sklearn.tree.DecisionTreeClassifier()


covidCases['New_cases'].value_counts()
feature_cols = ['New_cases', 'New_deaths']
X = covidCases[feature_cols] # Features
y = covidCases['New_deaths']
print(X)
print(y)

X_train, X_test, y_train, y_test = train_test_split(X,    # predictive features
                                                      y,      # target column
                                                      test_size=0.30,    # 30% of dataset will be set aside for test set
                                                      random_state=1)

clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
dot_data = sklearn.tree.export_graphviz(clf, out_file=None, 
                                feature_names=X.columns,  
                                class_names=y.unique(),
                                filled=True)

graph = graphviz.Source(dot_data, format="png") 
graph

But I'm getting the error TypeError: can only concatenate str (not "numpy.int64") to str , and somehow im new to python. So any help will be appreciated The error is related to the graph plotting.

Update: Error Message is the following:

  TypeError                                 Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 dot_data = sklearn.tree.export_graphviz(clf, out_file=None, 
      2                                 feature_names=X.columns,  
      3                                 class_names=y.unique(),
      4                                 filled=True)
      6 graph = graphviz.Source(dot_data, format="png") 
      7 graph

File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/tree/_export.py:889, in export_graphviz(decision_tree, out_file, max_depth, feature_names, class_names, label, filled, leaves_parallel, impurity, node_ids, proportion, rotate, rounded, special_characters, precision, fontname)
    870     out_file = StringIO()
    872 exporter = _DOTTreeExporter(
    873     out_file=out_file,
    874     max_depth=max_depth,
   (...)
    887     fontname=fontname,
    888 )
--> 889 exporter.export(decision_tree)
    891 if return_string:
    892     return exporter.out_file.getvalue()

File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/tree/_export.py:462, in _DOTTreeExporter.export(self, decision_tree)
    460     self.recurse(decision_tree, 0, criterion="impurity")
    461 else:
--> 462     self.recurse(decision_tree.tree_, 0, criterion=decision_tree.criterion)
    464 self.tail()

File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/tree/_export.py:521, in _DOTTreeExporter.recurse(self, tree, node_id, criterion, parent, depth)
    517 else:
    518     self.ranks[str(depth)].append(str(node_id))
    520 self.out_file.write(
--> 521     "%d [label=%s" % (node_id, self.node_to_str(tree, node_id, criterion))
    522 )
    524 if self.filled:
    525     self.out_file.write(
    526         ', fillcolor="%s"' % self.get_fill_color(tree, node_id)
    527     )

File ~/opt/anaconda3/lib/python3.9/site-packages/sklearn/tree/_export.py:374, in _BaseTreeExporter.node_to_str(self, tree, node_id, criterion)
    368     else:
    369         class_name = "y%s%s%s" % (
    370             characters[1],
    371             np.argmax(value),
    372             characters[2],
    373         )
--> 374     node_string += class_name
    376 # Clean up any trailing newlines
    377 if node_string.endswith(characters[4]):

TypeError: can only concatenate str (not "numpy.int64") to str

Data looks like: enter image description here

Ali
  • 103
  • 7
  • You can't concatenate different data-type objects. Both objects need to be of the type str. But when you print using a ,, it automatically formats the integer. See more [here](https://stackoverflow.com/questions/53846331/typeerror-can-only-concatenate-str-not-numpy-int64-to-str) – LSeu Aug 26 '22 at 07:49
  • but x values and y values are both numbers, which represents the number of death and the new cases, both are integers here @LSeu – Ali Aug 26 '22 at 07:55
  • Can you give us the full traceback of your error message please. – LSeu Aug 26 '22 at 07:58
  • Also the concatenation problem occurs at the print() were you try to print a np.int64 concatenate to a str. – LSeu Aug 26 '22 at 08:02
  • @LSeu done, check the updated post – Ali Aug 26 '22 at 08:07
  • may u plot which line ur talking about exactly if thats okay with you? – Ali Aug 26 '22 at 08:21
  • @LSeu I fixed the () issue, I have now the issue as mentioned In the question, the string and integer issue – Ali Aug 26 '22 at 08:32

2 Answers2

1

I think I found the issue, it was with the y.unique() , it was an array of integers, converting it by val = np.array(y.unique()).astype('str').tolist() made the trick

Ali
  • 103
  • 7
1

Mine is not answer but a suggestion, kindly post the entire corrected code as per your solution above. For exmaple i can see youve made reference to np (Numpy) but i cant see it imported in your original code.

import numpy as np

tree.export_graphviz(model1,out_file="BundesligaGenFTRPredictor.dot", feature_names=["HTHG","HTAG"], 
                     class_names=np.array(sorted(y.unique())).astype('str').tolist(), label="all", rounded=True,
                    filled=True)
Altimus Prime
  • 2,207
  • 2
  • 27
  • 46
Frank
  • 11
  • 2