When tuning parameters in Optuna, I have an invalid subspace in my space of possible parameters. In my particular case, two of the parameters that I'm tuning can cause extremely long trials (that I want to avoid) if they are both close to zero (< 1e-5), i.e.:
A > 1e-5 | A < 1e-5 | |
---|---|---|
B > 1e-5 | OK | OK |
B < 1e-5 | OK | TIMEOUT |
I'm obviously able to catch this edge case when both A < 1e-5 and B < 1e-5, but how should I let Optuna know that this is an invalid trial? I don't want to change the sampling ranges for A and B to exclude values < 1e-5, as it is fine if only one of A and B is < 1e-5.
I have two ideas so far:
Raise an Optuna pruning exception
optuna.exceptions.TrialPruned
. This would prune the trial before the code timed out, but I'm unsure if this tells Optuna that this is a bad area of the search space to evaluate. If it does guide the tuning away from this edge case, then I think this is the best option.Return some fixed trial score, e.g. 0. I know my trials will have a score between 0 and 1, therefore if this invalid edge case is reached, I could return the minimum possible score of 0. However, if most trial scores are 0.5 or greater, then the value of 0 for the edge case becomes an extreme outlier.
MWE:
import optuna
class MWETimeoutTuner:
def __call__(self, trial):
# Using a limit of 0.1 rather than 1e-5 so the edge case is triggered quicker
lim = 0.1
trial_a = trial.suggest_float('a', 0.0, 1.0)
trial_b = trial.suggest_float('d', 0.0, 1.0)
trial_c = trial.suggest_float('c', 0.0, 1.0)
trial_d = trial.suggest_float('d', 0.0, 1.0)
# Without this, we end up stuck in the infinite loop in _func_that_can_timeout
# But is pruning the trial the best way to way to avoid an invalid parameter configuration?
if trial_a < lim and trial_b < lim:
raise optuna.exceptions.TrialPruned
def _func_that_can_timeout(a, b, c, d):
# This mocks the timeout situation due to an invalid parameter configuration.
if a < lim and b < lim:
print('TIMEOUT:', a, b)
while True:
pass
# The maximum possible score would be 2 (c=1, d=1, a=0, b=0)
# However, as only one of a and b can be less than 0.1, the actual maximum is 1.9.
# Either (c=1, d=1, a=0, b=0.1) or (c=1, d=1, a=0.1, b=0)
return c + d - a - b
score = _func_that_can_timeout(trial_a, trial_b, trial_c, trial_d)
return score
if __name__ == "__main__":
tuner = MWETimeoutTuner()
n_trials = 1000
direction = 'maximize'
study_uid = "MWETimeoutTest"
study = optuna.create_study(direction=direction, study_name=study_uid)
study.optimize(tuner, n_trials=n_trials)
I've found this related issue, which suggests changing the sampling process based on existing values that have been sampled. In the MWE, this would look like:
trial_a = trial.suggest_float('a', 0.0, 1.0)
if trial_a < lim:
trial_b = trial.suggest_float('b', lim, 1.0)
else:
trial_b = trial.suggest_float('b', 0.0, 1.0)
However, upon testing, this produces the following warning:
RuntimeWarning: Inconsistent parameter values for distribution with name "b"! This might be a configuration mistake. Optuna allows to call the same distribution with the same name more then once in a trial. When the parameter values are inconsistent optuna only uses the values of the first call and ignores all following. Using these values: {'low': 0.1, 'high': 1.0}.>
So this doesn't seem to be a valid solution.
In the MWE, raising pruning exception works, and (near) optimal values are found. It seems in writing this question I have almost answered it myself in that pruning is the way to go, unless there is a better solution?