optuna threaded parallelization n_jobs=-1 not using all resources

Question

Simple question, I can find others with similar issues but no real solutions. TLDR: n_jobs=-1 enables simultaneous execution of trials, but no extra cores are utilized on my CPU and total runtime is slightly longer than with n_jobs=1.

Why is it that with this test code:

import optuna
optuna.logging.set_verbosity(optuna.logging.WARNING)
import datetime as dt
print('optuna.__version__: ', optuna.__version__)

def obj(trial, X, y):
    params = {
        'a': trial.suggest_int('a', 1, 10),
        'b': trial.suggest_float('b', 0, 10, step=.1),
        'c': trial.suggest_categorical('c', [True, False])
    }

    if params['c']:
        yhat = params['a']*X + params['b']
    else:
        yhat = params['a']*X + params['b'] + 1

    error = abs(y-yhat)
    return error

X, y = 3, 10
study = optuna.create_study(study_name='test', direction='minimize')
start = dt.datetime.now()
print('starting optimization with n_jobs=1 at ', start)
study.optimize(
    lambda trial: obj(trial, X, y),
    n_jobs=1, n_trials=1000)
runtime = dt.datetime.now()-start
best_params = study.best_params
print('n_jobs=1', 'runtime: ', runtime, 'best_params: ', best_params)

study = optuna.create_study(study_name='test', direction='minimize')
start = dt.datetime.now()
print('starting optimization with n_jobs=-1 at ', start)
study.optimize(
    lambda trial: obj(trial, X, y),
    n_jobs=-1, n_trials=1000)
runtime = dt.datetime.now()-start
best_params = study.best_params
print('n_jobs=-1', 'runtime: ', runtime, 'best_params: ', best_params)

error_best_result = obj(study.best_trial, X, y)
print('error best result: ', error_best_result)

Which yields this output:

optuna.__version__:  3.0.3
starting optimization with n_jobs=1 at  2022-10-18 14:08:17.323609
n_jobs=1 runtime:  0:00:08.097332 best_params:  {'a': 1, 'b': 7.0, 'c': True}
starting optimization with n_jobs=-1 at  2022-10-18 14:08:25.421941
n_jobs=-1 runtime:  0:00:09.394324 best_params:  {'a': 3, 'b': 0.0, 'c': False}
error best result:  0.0

When i comment out optuna.logging.set_verbosity(optuna.logging.WARNING) I can clearly see the first study executes trials one after the other, while the second study executes trials asynchronously. I have 32 cores and the maximum spread between trial numbers that complete one after the other appears to be about 30. So far so good, seems logical. I would expect my CPU to spike to 100% (especially when i replace the test objective function with an actually interesting objective that requires some computing). Most importantly, I would expect the threaded simultaneous trials to execute orders of magnitude faster than the sequentially executed trials.

This is not the case. Any idea why?

When I go to the documentation they suggest process parallelization, I don't have any experience doing that and their description of "open another terminal window for each process" is unfeasible for my use-case. Am i missing something?

Edit: I remembered someone talking about version 2.0.0 as working for parallization. So i made a new env and tested it out:

starting optimization with n_jobs=1 at  2022-10-18 14:37:53.917227
n_jobs=1 runtime:  0:00:04.603000 best_params:  {'a': 3, 'b': 0.0, 'c': False}
starting optimization with n_jobs=-1 at  2022-10-18 14:37:58.521227
n_jobs=-1 runtime:  0:00:05.715110 best_params:  {'a': 2, 'b': 4.0, 'c': True}

These results are very confusing, why is it twice as fast, but still not working when parallelized?

optuna threaded parallelization n_jobs=-1 not using all resources

0 Answers0