Questions tagged [random-forest]

In learning algorithms and statistical classification, a random forest is an ensemble classifier that consists in many decision trees. It outputs the class that is the mode of the classes output by individual trees, in other words, the class with the highest frequency.

Overview

Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees.

References

Random Forest page maintained by Leo Breiman and Adele Cutler, the creators of the algorithm.
Wikipedia pages on Random Trees, Random Forest and Ensemble Learning.
The R CRAN page for the randomForest package is located here

Tag usage

Questions on tag random-forest should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

3684 questions

207

votes

25 answers

How to extract the decision rules from scikit-learn decision-tree?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class='X'

python machine-learning scikit-learn decision-tree random-forest

asked Nov 26 '13 at 17:58

Dror Hilman

6,837
9
39
56

143

votes

7 answers

How are feature_importances in RandomForestClassifier determined?

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, which attributes/dates contribute to the result…

scikit-learn random-forest feature-selection

asked Apr 04 '13 at 11:53

user2244670

1,431
2
10
3

104

votes

3 answers

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper: P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning,…

scikit-learn random-forest

asked Mar 14 '14 at 15:50

denson

2,366
2
24
25

102

votes

6 answers

Do I need to normalize (or scale) data for randomForest (R package)?

I am doing regression task - do I need to normalize (or scale) data for randomForest (R package)? And is it neccessary to scale also target values? And if - I want to use scale function from caret package, but I did not find how to get data back…

r random-forest

asked Jan 22 '12 at 14:01

gutompf

1,305
3
11
9

votes

8 answers

RandomForestClassfier.fit(): ValueError: could not convert string to float

Given is a simple CSV file: A,B,C Hello,Hi,0 Hola,Bueno,1 Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so: cols =…

python scikit-learn random-forest

asked May 21 '15 at 21:51

nilkn

votes

3 answers

How to use random forests in R with missing values?

library(randomForest) rf.model <- randomForest(WIN ~ ., data = learn) I would like to fit a random forest model, but I get this error: Error in na.fail.default(list(WIN = c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, : missing values in object I have data…

r random-forest

asked Dec 03 '11 at 19:44

Borut Flis

15,715
30
92
119

votes

6 answers

Can sklearn random forest directly handle categorical features?

Say I have a categorical feature, color, which takes the values ['red', 'blue', 'green', 'orange'], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell…

python scikit-learn random-forest one-hot-encoding

asked Jul 12 '14 at 16:54

tkunk

1,378
1
13
19

votes

2 answers

What is out of bag error in Random Forests?

What is out of bag error in Random Forests? Is it the optimal parameter for finding the right number of trees in a Random Forest?

language-agnostic machine-learning classification random-forest

asked Aug 30 '13 at 21:46

csalive

votes

2 answers

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best. Whenever I do so I get a AttributeError: 'RandomForestClassifier' object has no attribute…

python scikit-learn random-forest cross-validation

asked May 07 '15 at 13:45

sapo_cosmico

6,274
12
45
58

votes

2 answers

How do I solve overfitting in random forest of Python sklearn?

I am using RandomForestClassifier implemented in python sklearn package to build a binary classification model. The below is the results of cross validations: Fold 1 : Train: 164 Test: 40 Train Accuracy: 0.914634146341 Test Accuracy: 0.55 Fold 2 :…

python machine-learning scikit-learn decision-tree random-forest

asked Dec 09 '13 at 04:40

Munichong

3,861
14
48
69

votes

8 answers

Random Forest Feature Importance Chart using Python

I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used: from sklearn.ensemble import RandomForestRegressor MT= pd.read_csv("MT_reduced.csv") df…

python plot random-forest feature-selection

asked May 21 '17 at 20:26

user348547

votes

3 answers

R Random Forests Variable Importance

I am trying to use the random forests package for classification in R. The Variable Importance Measures listed are: mean raw importance score of variable x for class 0 mean raw importance score of variable x for class…

r statistics data-mining random-forest

asked Apr 10 '09 at 02:18

thirsty93

2,602
6
26
26

votes

6 answers

multioutput regression by xgboost

Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model? Thanks in advance for any suggestions

machine-learning random-forest xgboost

asked Sep 16 '16 at 21:10

user1782011

votes

4 answers

How to tune parameters in Random Forest, using Scikit Learn?

class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, …

python parameters machine-learning scikit-learn random-forest

asked Mar 19 '16 at 22:10

O.rka

29,847
68
194
309

votes

5 answers

setting values for ntree and mtry for random forest regression model

I'm using R package randomForest to do a regression on some biological data. My training data size is 38772 X 201. I just wondered---what would be a good value for the number of trees ntree and the number of variable per level mtry? Is there an…

r statistics machine-learning regression random-forest

asked Dec 19 '12 at 16:09

DOSMarter

1,485
5
21
29

2 3

…

99 100 Next