Confused on Random Forest Classification, CV and Predicting on Test Set

Question

I have been reading Random Forest documentation and I am confused about how you apply your cross-validated model (based on your training data) to your testing data in order to make classification predictions.

My code is as follows but I have no idea on how I use this to predict? Normally you would fit the model and then call predict, but I've read you don't have to call 'fit' with Random Forest - so then how do I call predict if I haven't called fit first? So confused!

`clf = RandomForestClassifier(n_estimators=10, max_depth=None,
   min_samples_split=2, random_state=0)
   scores = cross_val_score(clf, X_train, y_train, cv = 10, scoring='precision')                      
   y_pred = clf.predict(X_test)`

#NotFittedError: Estimator not fitted, call fit before exploiting the model.

You need to fit() the model again on your `X_train` and `y_train`. Have a look at my [answer here](https://stackoverflow.com/a/42266274/3374996). — Vivek Kumar, May 30 '17 at 15:42
Thanks Vivek, so from a coding point of view, how do I now feed 'score' (which is now defined) into my prediction? — Ali, May 31 '17 at 03:10
So scores now has the CV applied to it so how would I call predict with the new scores object I just defined? — Ali, May 31 '17 at 11:49
The `scores` variable you get by executing the `cross_val_score()` is an array of scores returned from the estimator for each cv fold. It cannot be used in `predict()` method. Why do you want to use it in `predict()` method? — Vivek Kumar, May 31 '17 at 11:57
Okay so the scores just tell me how good the training model is correct? How do I know what constitutes a good score - is there a general benchmark that guides this decision? — Ali, May 31 '17 at 22:37
Generally, higher the score, the better the model. But in that you need to be attentive to check if the model is overfit or not. In case of `cross_val_score`, a higher score can be possible if one fold of test matches with train folds and scores go up because of that. You can read more [about it here](http://scikit-learn.org/stable/model_selection.html) and also search on https://stats.stackexchange.com — Vivek Kumar, Jun 01 '17 at 01:52

Confused on Random Forest Classification, CV and Predicting on Test Set

0 Answers0