Multiple trainings / multiple NN initialisations per Hyperparamter validation with Optuna and pruning

Question

I am just doing my first ML-with-optuna project. My question is how can I probe one set of hyperparamters for multiple NN initialization, where each run within one trial is still subject to pruning?

I am assuming that the initialization has quite some influence and I don't want to strike out good HP due to bad luck.

As far as I know each trial represents one set of HP. So if I want to eval them for multiple initialization I perform multiple trainings per trial. But within one trial I can only report one value for each timestamp.

Do I have to implement this without optuna? Should I go for an approach which lets optuna first suggest a set of HP and then fixes it for the next trials? Or do you know some good approach to achieve this with optuna?

Many thanks in advance!

Edit 1; Adding a minimal code example:

from random import randrange
import optuna


def objective(trial):
    """
    return x * 20 + random_offset
    multiplication calculated iteratively to enable pruning
    """

    x = trial.suggest_float("x", 0, 10)

    random_offset = randrange(0, 1000)
    temp = 1
    res_temp = None
    for i in range(20):
        temp += x
        res_temp = temp + random_offset
        trial.report(res_temp, i)

        if trial.should_prune():
            raise optuna.TrialPruned()

    return res_temp


if __name__ == '__main__':
    study = optuna.create_study(pruner=optuna.pruners.MedianPruner())

    study.optimize(objective, n_trials=20)

    print("best params:", study.best_params)
    print("best value:", study.best_value)

This example tries to find the "x" in a range of 0 to 10 which minimizes "x * 20". The obvious answer is 0. The objective function is calculating the result based on iterative summation; which uses pruning. Sadly the objective function is noisy due to the random offset. This is meant as a metaphor for training a NN. The iteration is the training loop, x is the hyperparamter and the offset is the random initialization of the network.

The problem which is caused by the noise is that you can't determine the quality of a hyperparamter for sure as the result might be dominated by the random offset. This might lead to selecting a sub-optimal x. If I am right, than increasing the number of trials, to smooth out the randomness, might not work as optuna might suggest new hyperparamters based on the old ones. So unlucky observations will hinder the the further progress.

So I assumed it would be best to evaluate the objective several times for the same set of hyperpramters and only remember the best "run".

So my question is how to best smooth out the noise? Is my assumption correct that increasing the number of trials only is not the best approach and how would you implement the repeated evaluation?

@ferdy I add an example and some explanation. Does this help you understanding the issue? — Osmosis D. Jones, Jan 31 '22 at 13:33

score 2 · Answer 1 · answered Jun 27 '22 at 22:26

A way to achieve this is to define a wrapper around the objective. This works because this wrapper will be called once for a new trial, but inside the wrapper we call the original objective multiple times.

Toy example:

import optuna
import random

def objective(trial, seed=0):
    random.seed(seed)
    a = trial.suggest_float('test', 0, 1)
    return a


def objective_wrapper(trial, nrseeds):
    res = []
    for ii in range(nrseeds):
        rr = objective(trial, seed=ii)
        res.append(rr)

    # add the individual results as an attribute to the trial if you want
    trial.set_user_attr("individual_seed_results", res)

    # let's print just to visualize the individual runs
    print('=====')
    print(res)

    return sum(res)/len(res) #could be some other aggregation

study = optuna.create_study(
    study_name='tst',
)

study.optimize(
    lambda trial: objective_wrapper(trial, 3),
    n_trials=5,
)

If you run this, then in this case the wrapper will print something like:

=====
[0.9422219634474698, 0.9422219634474698, 0.9422219634474698]
=====
[0.3789947506000524, 0.3789947506000524, 0.3789947506000524]
=====
[0.25406979924952877, 0.25406979924952877, 0.25406979924952877]
=====
[0.6927210276975587, 0.6927210276975587, 0.6927210276975587]
=====
[0.3583263556988684, 0.3583263556988684, 0.3583263556988684]

score 0 · Answer 2 · answered Feb 01 '22 at 04:14

Since your objective is now also dependent on randomness it is best to evaluate the objective several times as what you have assumed.

But even better try to identify where the randomness came from, is it from the seed number? If not then you really need more trials and more evaluation of complete epoch.

It would look something like this from the optuna example. Each epoch or step, the model is evaluated n_train_iter times for the same parameter.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split

import optuna

X, y = load_iris(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y)
classes = np.unique(y)


def objective(trial):
    alpha = trial.suggest_float("alpha", 0.0, 1.0)
    clf = SGDClassifier(alpha=alpha)
    n_train_iter = 100

    for step in range(n_train_iter):
        clf.partial_fit(X_train, y_train, classes=classes)

        intermediate_value = clf.score(X_valid, y_valid)
        trial.report(intermediate_value, step)

        if trial.should_prune():
            raise optuna.TrialPruned()

    return clf.score(X_valid, y_valid)


study = optuna.create_study(
    direction="maximize",
    pruner=optuna.pruners.MedianPruner(
        n_startup_trials=5, n_warmup_steps=30, interval_steps=10
    ),
)
study.optimize(objective, n_trials=20)

You can go further by calling

X_train, X_valid, y_train, y_valid = train_test_split(X, y)

multiple times just to find the the best objective value.

First: many thanks for helping :) 2nd paragraph: The randomness is from the seed number. Isn't that nearly always the case on a PC? Fixing the seed number doesn't seem quite like the right approach, more like cheating, more like remembering the one lucky trial. If you go for that you could choose all HP randomly and than only remember the best seed, couldn't you? I agree that it is worth making the training as robust as possible by identifying sources of randomness. 3rd paragraph. Seems the statement doesn't fit the code. Each step (loopbody) evaluates only once. Did i miss something? — Osmosis D. Jones, Feb 01 '22 at 11:26

Multiple trainings / multiple NN initialisations per Hyperparamter validation with Optuna and pruning

2 Answers2