Here is the help of sklearn.ensemble.RandomForestClassifier.fit()
. It is not clear whether there can be a problem when X and y are sorted by labels. My preliminary test suggests that it does not matter whether X and y are sorted.
Is my conclusion correct?
Help on class RandomForestClassifier in module sklearn.ensemble._forest:
class RandomForestClassifier(ForestClassifier)
...
| Build a forest of trees from the training set (X, y).
|
| Parameters
| ----------
| X : {array-like, sparse matrix} of shape (n_samples, n_features)
| The training input samples. Internally, its dtype will be converted
| to ``dtype=np.float32``. If a sparse matrix is provided, it will be
| converted into a sparse ``csc_matrix``.
|
| y : array-like of shape (n_samples,) or (n_samples, n_outputs)
| The target values (class labels in classification, real numbers in
| regression).