I am just doing my first ML-with-optuna project. My question is how can I probe one set of hyperparamters for multiple NN initialization, where each run within one trial is still subject to pruning?
I am assuming that the initialization has quite some influence and I don't want to strike out good HP due to bad luck.
As far as I know each trial represents one set of HP. So if I want to eval them for multiple initialization I perform multiple trainings per trial. But within one trial I can only report one value for each timestamp.
Do I have to implement this without optuna? Should I go for an approach which lets optuna first suggest a set of HP and then fixes it for the next trials? Or do you know some good approach to achieve this with optuna?
Many thanks in advance!
Edit 1; Adding a minimal code example:
from random import randrange
import optuna
def objective(trial):
"""
return x * 20 + random_offset
multiplication calculated iteratively to enable pruning
"""
x = trial.suggest_float("x", 0, 10)
random_offset = randrange(0, 1000)
temp = 1
res_temp = None
for i in range(20):
temp += x
res_temp = temp + random_offset
trial.report(res_temp, i)
if trial.should_prune():
raise optuna.TrialPruned()
return res_temp
if __name__ == '__main__':
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=20)
print("best params:", study.best_params)
print("best value:", study.best_value)
This example tries to find the "x" in a range of 0 to 10 which minimizes "x * 20". The obvious answer is 0. The objective function is calculating the result based on iterative summation; which uses pruning. Sadly the objective function is noisy due to the random offset. This is meant as a metaphor for training a NN. The iteration is the training loop, x is the hyperparamter and the offset is the random initialization of the network.
The problem which is caused by the noise is that you can't determine the quality of a hyperparamter for sure as the result might be dominated by the random offset. This might lead to selecting a sub-optimal x. If I am right, than increasing the number of trials, to smooth out the randomness, might not work as optuna might suggest new hyperparamters based on the old ones. So unlucky observations will hinder the the further progress.
So I assumed it would be best to evaluate the objective several times for the same set of hyperpramters and only remember the best "run".
So my question is how to best smooth out the noise? Is my assumption correct that increasing the number of trials only is not the best approach and how would you implement the repeated evaluation?