I am analyzing RandomForestClasifier
and need some help.
max_features
parameter gives the max no of features for split in random forest which is generally defined as sqrt(n_features)
. If m is sqrt of n, then no of combinations for DT formation is nCm. What if nCm is less than n_estimators (no of decision trees in random forest)?
example: For n = 7, max_features
is 3, so nCm is 35, meaning 35 unique combinations of features for decision trees. Now for n_estimators
= 100, will the remaining 65 trees have repeated combination of features? If so, won't trees be correlated introducing bias in the answer?