0

Am new to ML and trying to run a decision tree based model

I tried the below

X = df[['Quantity']]
y = df[['label']]
params = {'max_depth':[2,3,4], 'min_samples_split':[2,3,5,10]}
clf_dt = DecisionTreeClassifier()
clf = GridSearchCV(clf_dt, param_grid=params, scoring='f1')
clf.fit(X, y)
clf_dt = DecisionTreeClassifier(clf.best_params_)

And got the warning mentioned here

FutureWarning: Pass criterion={'max_depth': 2, 'min_samples_split': 2} as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
  warnings.warn(f"Pass {args_msg} as keyword args. From version "

Later, I tried running the below and got an error (but I already fit the model using .fit())

from sklearn import tree
tree.plot_tree(clf_dt, filled=True, feature_names = list(X.columns), class_names=['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])

NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 
'fit' with appropriate arguments before using this estimator.

Can help me with this on how can I fix this error?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
The Great
  • 7,215
  • 7
  • 40
  • 128

2 Answers2

3

If you go with best_params_, you'll have to refit the model with those parameters. Note that these should be unpacked when passed to the model:

clf_dt = DecisionTreeClassifier(**clf.best_params_)
clf_dt.fit(X, y)

However, you can also use the best_estimator_ attribute in order to access the best model directly:

clf_dt = clf.best_estimator_
a_guest
  • 34,165
  • 12
  • 64
  • 118
1

So there are two problems you are facing.

Firstly

Referring to

FutureWarning: Pass criterion={'max_depth': 2, 'min_samples_split': 2} as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error

You might try using dict class constructor when creating params:

params = dict(max_depth=[2,3,4], min_samples_split=[2,3,5,10])

But this warning seems weird and it didn't occur for me.

Secondly

Referring to

NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Here you can learn about the mandatory fitting step in sklearn. But as you said, you just did so in your first code example. Your problem is that using

clf_dt = DecisionTreeClassifier(clf.best_params_)

You instatiate a new DecisionTreeClassifier class which is therefore not fitted when you call

tree.plot_tree(clf_dt ...)

When you call

clf = GridSearchCV(clf_dt, param_grid=params, scoring='f1')

sklearn automatically assigns the best estimator to clf in your case. So just use this variable :) The following step clf_dt = DecisionTreeClassifier(clf.best_params_) isn't necessary.

Robin
  • 125
  • 5
  • `params = dict(max_depth=[2,3,4], min_samples_split=[2,3,5,10])` is exactly what the OP used and `clf = GridSearchCV(clf_dt, param_grid=**params, scoring='f1')` is invalid syntax. – a_guest Jan 26 '22 at 14:06
  • You're right with the invalid syntax thing. My mistake. Using the alternative for creating the param dict was just a suggestion. I don't really understand why this warning occurs. – Robin Jan 26 '22 at 15:45