I've been training sklearn Random Forests on a regular basis over the last few months. I've noticed that when exporting the model to a file using joblib the filesize has increased dramatically - from 2.5 GB up to 11GB. All the parameters have remained the same, and the number of training features has remained fixed. The only difference is that the number of examples in the training data has increased.
Given the parameters have remained fixed, and the number of estimators and the depth of each tree is specified, why would increasing the number of examples have the effect of increasing the size of the Random Forest?
Here are the parameters for the model:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='sqrt', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=20, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=1000, n_jobs=-1,
oob_score=False, random_state=123, verbose=0, warm_start=False)