0

I'm training a simple MLP using Watson Studio's HPO capability. However when viewing my logs, the metrics are not displaying. The metrics logging works when running a non-HPO training run, but the logs don't show when running in HPO.

Here's how I defined my Tensorboard callback:

tb_directory = os.path.join(os.environ["JOB_STATE_DIR"], "logs", "tb", 
os.makedirs(tb_directory, exist_ok=True)
tensorboard = TensorBoard(log_dir=tb_directory)

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    validation_data=(x_test, y_test),
                    callbacks=[tensorboard])
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Biosopher
  • 536
  • 4
  • 12

1 Answers1

1

Found the answer. When running HPO, metrics for each training run must be placed into its own subdirectory otherwise it's overwritten. So I should have setup my Tensorboard log directory like this:

tb_directory = os.path.join(os.environ["SUBID"],os.environ["JOB_STATE_DIR"], "logs", "tb", 
Biosopher
  • 536
  • 4
  • 12