I'm tuning a model with the Keras tuner BayesianOptimization
. The tuning objective is val_loss
which is calculated after each epoch. As I understand it, the tuner will go through various hyperparameter configurations and train the model while keeping track of val_loss
. It saves the model weights of the model at the epoch with the lowest (best) val_loss
. After tuning, the tuner method get_best_models
returns the model that had the best val_loss
at any point in its training.
However, looking at the tuning log, I've noticed that the final supposed best val_loss
isn't actually the lowest val_loss
it reported during tuning. In the log you can see how the "best so far" val_loss
increases to 0.431 after trial 64 which had a much worse val_loss
.
Here's an excerpt of the log: (I've omitted log lines from training with ...
)
Search: Running Trial #62
Hyperparameter |Value |Best Value So Far
lstm_reg |0.01 |0
lstm_units |384 |416
learning_rate |0.01741 |0.00062759
Epoch 1/200
58/58 - 8s - loss: 5.8378 - mean_absolute_error: 0.8131 - val_loss: 2.1253 - val_mean_absolute_error: 0.6561
...
Epoch 26/200
58/58 - 5s - loss: 0.4074 - mean_absolute_error: 0.4579 - val_loss: 0.8352 - val_mean_absolute_error: 0.5948
Trial 62 Complete [00h 02m 37s]
val_loss: 0.5230200886726379
Best val_loss So Far: 0.396116703748703
Total elapsed time: 04h 32m 29s
Search: Running Trial #63
Hyperparameter |Value |Best Value So Far
lstm_reg |0.001 |0
lstm_units |288 |416
learning_rate |0.00073415 |0.00062759
Epoch 1/200
58/58 - 5s - loss: 0.8142 - mean_absolute_error: 0.6041 - val_loss: 0.8935 - val_mean_absolute_error: 0.5796
...
Epoch 45/200
58/58 - 5s - loss: 0.1761 - mean_absolute_error: 0.2561 - val_loss: 0.8256 - val_mean_absolute_error: 0.6804
Trial 63 Complete [00h 04m 04s]
val_loss: 0.527589738368988
Best val_loss So Far: 0.396116703748703
Total elapsed time: 04h 36m 34s
Search: Running Trial #64
Hyperparameter |Value |Best Value So Far
lstm_reg |0.01 |0
lstm_units |384 |416
learning_rate |0.00011261 |0.00062759
Epoch 1/200
58/58 - 6s - loss: 4.1151 - mean_absolute_error: 0.6866 - val_loss: 3.3185 - val_mean_absolute_error: 0.4880
...
Epoch 94/200
58/58 - 6s - loss: 0.3712 - mean_absolute_error: 0.3964 - val_loss: 0.7933 - val_mean_absolute_error: 0.5781
Trial 64 Complete [00h 09m 06s]
val_loss: 0.6574578285217285
Best val_loss So Far: 0.43126755952835083
Total elapsed time: 04h 45m 40s
Search: Running Trial #65
Hyperparameter |Value |Best Value So Far
lstm_reg |0.0001 |0
lstm_units |480 |256
learning_rate |0.010597 |0.05
Epoch 1/200
58/58 - 6s - loss: 1.1511 - mean_absolute_error: 0.7090 - val_loss: 1.1972 - val_mean_absolute_error: 0.6724
...
The tuning summary states the best val_loss
is 0.400 even though it must have found a model at some point with val_loss
at 0.396 which is actually better. (in trial 58 to be exact)
Best val_loss So Far: 0.4001617431640625
Total elapsed time: 15h 06m 02s
Hyperparameter search complete. Optimal parameters: ...
This is the code that creates the tuner:
tuner = kt.BayesianOptimization(
feedback_model_builder,
objective="val_loss",
directory="./model_tuning",
project_name=name,
max_trials=200
)
and starts the tuning process:
tuner.search(
multi_window.train,
validation_data=multi_window.val,
callbacks=[early_stopping],
verbose=tf_verbosity,
epochs=200,
)
Why does the "best" model not have the lowest encountered val_loss
? Am I misunderstanding how the tuner works or is this a bug?