0

I have a test data which is in String so I coded the strings using the one-hot-encoding technic. now I am looking for two things, Extracting the features names and putting them into the decision tree plotted.

Also, on the plotted decision tree. it shows <= but for strings we need them to be = and how can I add the exact data on the plotted tree so i am able to read the displayed tree

here is the code I used.

import numpy as np 
from sklearn import preprocessing
from sklearn import tree

# Variable Declaration
X = np.array([["sunny", "sunny", "overcast", "rain", "rain", "rain", "overcast", "sunny", "sunny", "rain", "sunny", "overcast", "overcast", "rain"],["hot", "hot", "hot", "mild", "cool", "cool", "cool", "mild", "cold", "mild", "mild", "mild", "hot", "mild"],["high", "high", "high", "high", "normal", "normal", "normal", "high", "normal", "normal", "normal", "high", "normal", "high"],["weak", "strong", "weak", "weak", "weak", "strong", "weak", "weak", "weak", "strong", "strong", "strong", "weak", "strong"]])
Y = np.array(["no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no"])

X = X.transpose()
Y = Y.transpose()

# Encoding strings into array numbers "one-hit-encoding"
enc = preprocessing.OneHotEncoder()
enc.fit(X)

# Categories found in the feature X
#print(enc.categories_)
Xenc = enc.transform(X).toarray()

# Encoding strings into array numbers 
Yenc = Y
Yenc[Yenc == 'no'] = 0
Yenc[Yenc == 'yes'] = 1

# Train the tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(Xenc,Yenc)

# plotting the tree
tree.plot_tree(clf)
taou
  • 41
  • 4

0 Answers0