Consider that this colab notebook is a very simple version of how TF-Agents actually works. In reality you should use the Driver to sample trajectories instead of you manually calling
agent.action(state)
env.step(action)
at every iteration. The other advantage of the Driver is that it provides easy compatibility with all the metrics in TF-Agents.
As to your question here is how:
At the beginning of your training define a summary_writer with something like:
train_dir = os.path.join(root_dir, 'train')
train_summary_writer = tf.summary.create_file_writer(
train_dir, flush_millis=10000)
train_summary_writer.set_as_default()
Now everytime you call agent.train it will flush to this summary writer and its tensorboard folder train_dir
.
To add some metrics into the mix simply define them with something like:
train_metrics = [
tf_metrics.NumberOfEpisodes(),
tf_metrics.EnvironmentSteps(),
tf_metrics.AverageReturnMetric(buffer_size=collect_episodes_per_epoch),
tf_metrics.AverageEpisodeLengthMetric(buffer_size=collect_episodes_per_epoch),
]
Pass them to the Driver as observers together with your Replay Buffer like this:
dynamic_episode_driver.DynamicEpisodeDriver(
tf_env,
collect_policy,
observers=replay_observer + train_metrics,
num_episodes=collect_episodes_per_epoch).run()
And after this log them to your summaries with:
for train_metric in train_metrics:
train_metric.tf_summaries(train_step=epoch_counter, step_metrics=train_metrics[:2])
In case you're wondering, the step_metrics
arg is to plot the last two metrics against the first two.