I use random forest package in R for regression, it gives me two kind of information: Mean of squared residuals
and % Var explained. But I wanna calculate the RMSE
and R^2
of the training and test sets, can anyone help me how can I find these kind of information?

- 20,441
- 9
- 58
- 74

- 109
- 1
- 1
- 10
-
3Please provide a minimally reproducible example of your code with library dependencies and any functions you used. – mlegge Apr 21 '15 at 14:48
1 Answers
Sorry this is not a specific answer, but I do not have enough cred to leave a comment.
It is tough to say how you may get at what you want without a reproducible example. However, if you used the xtest=
and ytest=
arguments in the call to randomForest
(assuming you are using the "randomForest" package), then what you are looking for should be a part of the resulting randomForest object. What you want to look in is the test
part of the resulting random forest list.
An attempted example:
rf.results <- randomForest( whatever arguments )
rf.results$test$mse # mse (maybe you can take the square root to get rmse)
rf.results$test$rsq # pseudo-R2 for random forest
If you have the random forest package loaded you can validate this information as well as do some exploration yourself with ?randomForest
. The "Value" section of the documentation details the object that results from a call to randomForest
and where you can find various performance metrics.

- 1,192
- 9
- 16
-
Thank you, but two more question: 1: with rf.results$mse, can I calculate the mse and rsq of training set? and the second question, why I got a vector as results!!!? in fact, I need just one real as mse and rsq. but it gives me one mse and one rsq for each sample of data, I think. what should I do? – Farhaneh Moradi Apr 22 '15 at 07:11
-
`rf.results$mse` will give you the mse of the training set and `rf.results$rsq` will give the pseudo-R2 for the training set. The mse and rsq from rf.results$test are performance measures on the validation set. You should use these to find the optimal number of trees to have in the forest.The reason you get a vector of results is because of the `ntree` argument. You get performance measures for the random forests consisting of 1 to `ntree` trees. – BazookaDave Apr 22 '15 at 15:29