0

I want to visualize the number of trees and the oob error for my RandomForestRegresser and GradietBoostRegressor. So I have coded this lines, but of some reason there 'numpy.ndarray' object is not callable. Is here anybody that knows why this did not worked? I hope you have a nice day and thank you!

train_results = []
test_results = []
list_nb_trees = [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70 , 75, 80, 85, 90, 95, 100]

for nb_trees in list_nb_trees:
    rf = RandomForestRegressor(n_estimators=nb_trees,
                               max_depth= None,
                               max_features= 50,
                               min_samples_leaf= 5,
                               min_samples_split= 2,
                               random_state= 42,
                               oob_score= True, 
                               n_jobs= -1)
    rf.fit(X_train_v1, y_train_v1)

train_results.append(mean_squared_error(y_train_v1, rf.oob_prediction_(X_train_v1)))
test_results.append(mean_squared_error(y_test_v1, rf.oob_prediction_(X_test_v1)))

plt.figure(figsize=(15, 5))
line2, = plt.plot(list_nb_trees, test_results, color="g", label="Test OOB Score")
line1, = plt.plot(list_nb_trees, train_results, color="b", label="Training  OOB Score")
plt.title('Trainings- und Test Out-of-Bag Score')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel('MSE')
plt.xlabel('n_estimators')
plt.show()
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
/opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates.
  warn("Some inputs do not have OOB scores. "
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-282-80f6bbb31b23> in <module>
     14     rf.fit(X_train_v1, y_train_v1)
     15 
---> 16 train_results.update(mean_squared_error(y_train_v1, rf.oob_prediction_(X_train_v1)))
     17 test_results.update(mean_squared_error(y_test_v1, rf.oob_prediction_(X_test_v1)))
     18 

TypeError: 'numpy.ndarray' object is not callable
ml_learner
  • 143
  • 6

1 Answers1

1

Have a look here. oob_prediction_ is an array containing the oob-predictions on your training set.

Your code should therefore be more like:

train_oob_mse = mean_squared_error(y_train_v1, rf.oob_prediction_)

All test samples are, in a sense, "out of bag" but it's uncommon to call it so. It's just the test error. You'll have to predict to calculate it:

test_mse = mean_squared_error(y_test_v1, rf.predict(X_test_v1))

That being said, your code only keep the last trained rf and therefore, your *_results will contain just one value, but I imaging that is just a mistake of copy/paste. Furthermore, the warning "Some inputs do not have OOB scores. " indicates that the way you calculate the oob error is not correct, since there will be some samples that actually have no prediction.

mrks
  • 513
  • 3
  • 7
  • Ok thanks for the content. Do you know I can get the complete rf not just the last one? I do not think that this code will worked either. I get this Traceback warn("Some inputs do not have OOB scores. " /opt/conda/lib/python3.7/site-packages/sklearn/ensemble/forest.py:737: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates. warn("Some inputs do not have OOB scores. " – ml_learner Jan 10 '20 at 16:24
  • A naive approach would be to calculate the errors within the loop (i.e. for each trianed forest). To avoid the issue with the OOB error, you might consider calculating the _training_ error instead of the OOB. – mrks Jan 10 '20 at 16:58
  • You thinking it could be a copy/paste mistake. The original code was train_results.append(mean_squared_error(y_train, rf.predict(X_train))) test_results.append(mean_squared_error(y_test, rf.predict(X_test))) to predict the training and test error. Does this makes sense to you? – ml_learner Jan 10 '20 at 17:20
  • 1
    What i meant was that you "lost" the indentation for those two lines when you copy/pasted it. They should be within the loop. – mrks Jan 10 '20 at 17:23
  • Ah i think now i unterstand you. The two lines have under the rf.fit(X_train, X_test) statement? – ml_learner Jan 10 '20 at 17:29
  • Yes, exactly those. – mrks Jan 10 '20 at 17:30
  • I hadn't noticed that. Thanks! – ml_learner Jan 10 '20 at 17:31