-2

I'm performing RandomForest and AdaBoost Regression in python

My results are not reproducible [ My prediction changes everytime run the with same data and code]

seed = np.random.seed(22)
rng = np.random.RandomState(1)      

param_grid = {'n_estimators': [10, 100, 1000]}
model_rfr = GridSearchCV(RandomForestRegressor(random_state = rng), param_grid, cv=3, n_jobs=-1, verbose=1)
model_rfr.fit(train_x1,train_y1)
test_rfr = model_rfr.predict(test_y1)
param_grid = {"n_estimators":[100,500],"learning_rate":list(np.linspace(0.01,1,10)),"loss":["linear", "square", "exponential"]}
model_adr = RandomizedSearchCV(AdaBoostRegressor(DecisionTreeRegressor()), param_grid,n_jobs=-1,n_iter=10,cv=3,random_state = rng)
model_adr.fit(train_x1,train_y1)
test_adr = model_adr.fit(test_y1)

Here test_adr & test_rfr values change, every single time, I run my code.

Kindly use any sample data for Regression. But please suggest how to make my result reproducible.

Teja S
  • 19
  • 1
  • 5
  • ++ Now Gridsearch result is reproducible, Suggest for RandomizedCV – Teja S Aug 23 '17 at 10:47
  • Also, `predict()` method should be sent with `test_x1` whereas you are sending `test_y1`, and last line of your code should be `model_adr.predict()`, not `model_adr.fit()` – Vivek Kumar Aug 23 '17 at 11:09
  • AdaBoostRegressor and DecisionTreeRegressor have the `random_state` param as well. With those set, I am able to duplicate the results – Vivek Kumar Aug 23 '17 at 11:18
  • Thanks for your help, I got it now. But with some random_state value I'm getting much better result, can I optimize it ? and make it reproducible ? – Teja S Aug 23 '17 at 11:45
  • 1
    No, `random_state` is not meant to be optimized. Consider this as getting lucky. This may not be case everytime. Only believe on the average output of a multi-fold cross-validation. – Vivek Kumar Aug 23 '17 at 11:51

1 Answers1

1

Thanks for you contribution. Please find the reproducible results code.

seed = np.random.seed(22)
rng = np.random.RandomState(1)
param_grid = {'n_estimators': [10, 100, 1000]}
model_rfr = GridSearchCV(RandomForestRegressor(random_state = rng), param_grid, cv=3, n_jobs=-1, verbose=1)
model_rfr.fit(train_x1,train_y1)
test_rfr = model_rfr.predict(test_y1)
param_grid = {"n_estimators":[100,500],"learning_rate":list(np.linspace(0.01,1,10)),"loss":["linear", "square", "exponential"]}
model_adr = RandomizedSearchCV(AdaBoostRegressor(DecisionTreeRegressor(random_state = rng)), param_grid,n_jobs=-1,n_iter=10,cv=3,random_state = rng)
model_adr.fit(train_x1,train_y1)
test_adr = model_adr.predict(test_x1)
Teja S
  • 19
  • 1
  • 5
  • want to highlight that the important part is setting both the `random_state` and `np.random.seed` – Tim Sep 21 '22 at 18:45