Questions tagged [random-forest]

In learning algorithms and statistical classification, a random forest is an ensemble classifier that consists in many decision trees. It outputs the class that is the mode of the classes output by individual trees, in other words, the class with the highest frequency.

Overview

Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes output by individual trees.

References

Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

3684 questions
207
votes
25 answers

How to extract the decision rules from scikit-learn decision-tree?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class='X'
143
votes
7 answers

How are feature_importances in RandomForestClassifier determined?

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, which attributes/dates contribute to the result…
user2244670
  • 1,431
  • 2
  • 10
  • 3
104
votes
3 answers

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper: P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning,…
denson
  • 2,366
  • 2
  • 24
  • 25
102
votes
6 answers

Do I need to normalize (or scale) data for randomForest (R package)?

I am doing regression task - do I need to normalize (or scale) data for randomForest (R package)? And is it neccessary to scale also target values? And if - I want to use scale function from caret package, but I did not find how to get data back…
gutompf
  • 1,305
  • 3
  • 11
  • 9
89
votes
8 answers

RandomForestClassfier.fit(): ValueError: could not convert string to float

Given is a simple CSV file: A,B,C Hello,Hi,0 Hola,Bueno,1 Obviously the real dataset is far more complex than this, but this one reproduces the error. I'm attempting to build a random forest classifier for it, like so: cols =…
nilkn
  • 935
  • 1
  • 7
  • 8
83
votes
3 answers

How to use random forests in R with missing values?

library(randomForest) rf.model <- randomForest(WIN ~ ., data = learn) I would like to fit a random forest model, but I get this error: Error in na.fail.default(list(WIN = c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, : missing values in object I have data…
Borut Flis
  • 15,715
  • 30
  • 92
  • 119
80
votes
6 answers

Can sklearn random forest directly handle categorical features?

Say I have a categorical feature, color, which takes the values ['red', 'blue', 'green', 'orange'], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell…
tkunk
  • 1,378
  • 1
  • 13
  • 19
73
votes
2 answers

What is out of bag error in Random Forests?

What is out of bag error in Random Forests? Is it the optimal parameter for finding the right number of trees in a Random Forest?
58
votes
2 answers

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best. Whenever I do so I get a AttributeError: 'RandomForestClassifier' object has no attribute…
sapo_cosmico
  • 6,274
  • 12
  • 45
  • 58
54
votes
2 answers

How do I solve overfitting in random forest of Python sklearn?

I am using RandomForestClassifier implemented in python sklearn package to build a binary classification model. The below is the results of cross validations: Fold 1 : Train: 164 Test: 40 Train Accuracy: 0.914634146341 Test Accuracy: 0.55 Fold 2 :…
Munichong
  • 3,861
  • 14
  • 48
  • 69
52
votes
8 answers

Random Forest Feature Importance Chart using Python

I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used: from sklearn.ensemble import RandomForestRegressor MT= pd.read_csv("MT_reduced.csv") df…
user348547
  • 623
  • 1
  • 6
  • 4
46
votes
3 answers

R Random Forests Variable Importance

I am trying to use the random forests package for classification in R. The Variable Importance Measures listed are: mean raw importance score of variable x for class 0 mean raw importance score of variable x for class…
thirsty93
  • 2,602
  • 6
  • 26
  • 26
46
votes
6 answers

multioutput regression by xgboost

Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model? Thanks in advance for any suggestions
user1782011
  • 875
  • 1
  • 7
  • 13
44
votes
4 answers

How to tune parameters in Random Forest, using Scikit Learn?

class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, …
O.rka
  • 29,847
  • 68
  • 194
  • 309
42
votes
5 answers

setting values for ntree and mtry for random forest regression model

I'm using R package randomForest to do a regression on some biological data. My training data size is 38772 X 201. I just wondered---what would be a good value for the number of trees ntree and the number of variable per level mtry? Is there an…
DOSMarter
  • 1,485
  • 5
  • 21
  • 29
1
2 3
99 100