-5

I'm trying to check for Random Forest regressor performance effected by n_estimators.

seed = np.random.seed(1962)
rng = np.random.RandomState(1962)

np.random.seed(1962)

estimators = [pow(2,3),10,pow(2,4),pow(2,5),pow(2,6),pow(2,7),pow(2,8),500,pow(2,9),pow(2,10),pow(2,11)]
#oob_train = {}
train_acc = {}
test_acc = {}
for w in range(0,len(estimators),1):
    modelrfe = RandomForestRegressor(n_estimators = estimators[w],random_state=rng, n_jobs = -1)
    model_params = estimators[w]
    modelrfe.fit(train_x1,train_y1)
    train_acc[model_params] = mean_absolute_error(scale_data.inverse_transform(train_y1.reshape(-1,1)),scale_data.inverse_transform(modelrfe.predict(train_x1).reshape(-1,1)))
    test_acc[model_params] = mean_absolute_error(scale_data.inverse_transform(test_y1.reshape(-1,1)),scale_data.inverse_transform(modelrfe.predict(test_x1).reshape(-1,1)))



train_acc = pd.DataFrame(train_acc.items())
train_acc.columns = ['keys','Trainerror']
test_acc = pd.DataFrame(test_acc.items())
test_acc.columns = ['keys','Testerror']
error_df3 = pd.merge(train_acc, test_acc, on='keys')
error_df3 = pd.DataFrame(error_df3)

It is not reproducible I've also defined rng in the beginning.

NOTE: Imagine a For Loop for 1: nrow(dataframe) for each 1 it passes through multiple Models & I've defined rng & Seed in the beginning of the For loop.

Help me Out! .

2 Sample Outputs, which should've been ideally. Here keys refer to n_estimators

[enter image description here][Simulation 1] [enter image description here][Simulation 2]

Teja S
  • 19
  • 1
  • 5
  • You forgot to attach the images. Also, it would be better if you could provide a [MCVE](https://stackoverflow.com/help/mcve) – Vivek Kumar Nov 27 '17 at 11:42
  • Its long code, that's why I haven't added any relevant. I'm enquiring about the Algo to make it reproducible, like using the random_state variable in it – Teja S Nov 27 '17 at 11:53

1 Answers1

0

Please find the Answer:

Mistake : I was using the rng = A Randomstate Instance [ np.random.RandomState(1962)]

Ideally I should mention the seed value as int in the random_state variable.

i.e rng = 1962 seeding the random instance.

Then we should proceed with using the random_state vairable in the Model for reproducibility.

Teja S
  • 19
  • 1
  • 5