Is it a flaw that Optuna examples return the evaluation metric of the test set?

Question

I am using Optuna for parameter optimization for some models.

In almost all the examples the objective function returns a evaluation metric on the TEST set, and tries to minimize/maximize this. I feel like this is a flaw in the examples since Optuna then optimizes her parameters on unseen data.

Optimizing on a cv of the train set would imho be more robust. Would like to hear other thoughts and check if I am missing something.

Thanks!

score 0 · Answer 1 · answered Oct 14 '20 at 20:43

No, it is not a flaw, it is a feature. Performance should be evaluated on a test data set not seen by the algorithm.

If you want to cross-validate, it might take you months to complete a simple study with Optuna. It is not wrong to do that, but probably a waste of time because Optuna's algorithm is a Bayesian optimizer, which cross-validation can only approximate.

That being said, if you are using machine learning and are required to have a train/validate loop per epoch, I recommend using Jun Shao's proportion of n**(0.75) as your training set size, randomly chosen before training starts; not only is it faster, but it is probably better.

So while there is a need to do multiple training and validation actions in machine learning, it is not necessary to cross-validate that model's performance if you are using Optuna. Please click the link above to see my answer on cross-validated's SE site, and from there you can click through to the Github repo but please comment first and/or see what others are saying.

score 0 · Answer 2 · answered May 16 '23 at 07:00

For me it is a flaw, as the same data will be used for finding optimal hyperparameters and for final evaluation. There is no room for checking generalization.

What can be used in Cross Validation for optimization to have even more generalised view on the data ([1]) and what should be used - testing of the most promising models on hold-out/ test set, while tuning on train set (good overview can be found here [2]).

[1] https://optuna.readthedocs.io/en/stable/reference/generated/optuna.integration.OptunaSearchCV.html

[2] https://stats.stackexchange.com/questions/366862/is-using-both-training-and-test-sets-for-hyperparameter-tuning-overfitting

Is it a flaw that Optuna examples return the evaluation metric of the test set?

2 Answers2