0

I have been reading Random Forest documentation and I am confused about how you apply your cross-validated model (based on your training data) to your testing data in order to make classification predictions.

My code is as follows but I have no idea on how I use this to predict? Normally you would fit the model and then call predict, but I've read you don't have to call 'fit' with Random Forest - so then how do I call predict if I haven't called fit first? So confused!

`clf = RandomForestClassifier(n_estimators=10, max_depth=None,
   min_samples_split=2, random_state=0)
   scores = cross_val_score(clf, X_train, y_train, cv = 10, scoring='precision')                      
   y_pred = clf.predict(X_test)`

#NotFittedError: Estimator not fitted, call fit before exploiting the model.

Ali
  • 837
  • 2
  • 12
  • 18
  • You need to fit() the model again on your `X_train` and `y_train`. Have a look at my [answer here](https://stackoverflow.com/a/42266274/3374996). – Vivek Kumar May 30 '17 at 15:42
  • Thanks Vivek, so from a coding point of view, how do I now feed 'score' (which is now defined) into my prediction? – Ali May 31 '17 at 03:10
  • I did not understand. Can you explain a bit more? – Vivek Kumar May 31 '17 at 05:27
  • So scores now has the CV applied to it so how would I call predict with the new scores object I just defined? – Ali May 31 '17 at 11:49
  • The `scores` variable you get by executing the `cross_val_score()` is an array of scores returned from the estimator for each cv fold. It cannot be used in `predict()` method. Why do you want to use it in `predict()` method? – Vivek Kumar May 31 '17 at 11:57
  • Okay so the scores just tell me how good the training model is correct? How do I know what constitutes a good score - is there a general benchmark that guides this decision? – Ali May 31 '17 at 22:37
  • Generally, higher the score, the better the model. But in that you need to be attentive to check if the model is overfit or not. In case of `cross_val_score`, a higher score can be possible if one fold of test matches with train folds and scores go up because of that. You can read more [about it here](http://scikit-learn.org/stable/model_selection.html) and also search on https://stats.stackexchange.com – Vivek Kumar Jun 01 '17 at 01:52
  • Thank you Vivek! Appreciate your patience – Ali Jun 01 '17 at 06:28

0 Answers0