0

Which of the following understanding is true?

  • The number of features that we can use per tree (we bootstrap a subset of features)
  • The number of features that we use for each split

With equivalent function RandomForestClassifier.max_features in Python package scikit-learn, the first understanding is true. What is the situation in R language?

Thank you!

Cloudy
  • 58
  • 6

1 Answers1

0

mtry in ranger and randomForest is the number of features, randomly sampled, to split at each node.

phiver
  • 23,048
  • 14
  • 44
  • 56
  • Normally, the splitting rule is something like "x1 >= c". If `mtry` larger than 2, does the splitting rule becomes something like "x1 + x2 >= c"? – Cloudy May 08 '22 at 12:49
  • @Cloudy, default is the square root of the number of variables. so if vars = 10, then mtry is 3. If you tell mtry = 4, then the split will consider 4 vars instead of 3. – phiver May 08 '22 at 14:18