12

I'm quite familiar in TensorFlow 1.x and I'm considering to switch to TensorFlow 2 for an upcoming project. I'm having some trouble understanding how to write scalars to TensorBoard logs with eager execution, using a custom training loop.

Problem description

In tf1 you would create some summary ops (one op for each thing you would want to store), which you would then merge into a single op, run that merged op inside a session and then write this to a file using a FileWriter object. Assuming sess is our tf.Session(), an example of how this worked can be seen below:

# While defining our computation graph, define summary ops:
# ... some ops ...
tf.summary.scalar('scalar_1', scalar_1)
# ... some more ops ...
tf.summary.scalar('scalar_2', scalar_2)
# ... etc.

# Merge all these summaries into a single op:
merged = tf.summary.merge_all()

# Define a FileWriter (i.e. an object that writes summaries to files):
writer = tf.summary.FileWriter(log_dir, sess.graph)

# Inside the training loop run the op and write the results to a file:
for i in range(num_iters):
    summary, ... = sess.run([merged, ...], ...)
    writer.add_summary(summary, i)

The problem is that sessions don't exist anymore in tf2 and I would prefer not disabling eager execution to make this work. The official documentation is written for tf1 and all references I can find suggest using the Tensorboard keras callback. However, as far as I know, this only works if you train the model through model.fit(...) and not through a custom training loop.

What I've tried

  • The tf1 version of tf.summary functions, outside of a session. Obviously any combination of these functions fails, as FileWriters, merge_ops, etc. don't even exist in tf2.
  • This medium post states that there has been a "cleanup" in some tensorflow APIs including tf.summary(). They suggest using from tensorflow.python.ops.summary_ops_v2, which doesn't seem to work. This implies using a record_summaries_every_n_global_steps; more on this later.
  • A series of other posts 1, 2, 3, suggest using the tf.contrib.summary and tf.contrib.FileWriter. However, tf.contrib has been removed from the core TensorFlow repository and build process.
  • A TensorFlow v2 showcase from the official repo, which again uses the tf.contrib summaries along with the record_summaries_every_n_global_steps mentioned previously. I couldn't make this to work either (even without using the contrib library).

tl;dr

My questions are:

  • Is there a way to properly use tf.summary in TensroFlow 2?
  • If not, is there another way to write TensorBoard logs in TensorFlow 2, when using a custom training loop (not model.fit())?
Javier
  • 225
  • 2
  • 6
  • 1
    For adding 2 scalars in one plot https://stackoverflow.com/questions/58181527/merging-2-plots-in-tensorboard-2-with-tensorflow-2 – Vinod prime Nov 02 '19 at 07:38

1 Answers1

16

Yes, there is a simpler and more elegant way to use summaries in TensorFlow v2.

First, create a file writer that stores the logs (e.g. in a directory named log_dir):

writer = tf.summary.create_file_writer(log_dir)

Anywhere you want to write something to the log file (e.g. a scalar) use your good old tf.summary.scalar inside a context created by the writer. Suppose you want to store the value of scalar_1 for step i:

with writer.as_default():
    tf.summary.scalar('scalar_1', scalar_1, step=i)

You can open as many of these contexts as you like inside or outside of your training loop.

Example:

# create the file writer object
writer = tf.summary.create_file_writer(log_dir)

for i, (x, y) in enumerate(train_set):

    with tf.GradientTape() as tape:
        y_ = model(x)
        loss = loss_func(y, y_)

    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # write the loss value
    with writer.as_default():
        tf.summary.scalar('training loss', loss, step=i+1)
Djib2011
  • 6,874
  • 5
  • 36
  • 41
  • 2
    Thanks, that works! I can't believe they don't have any documentation for this! – Javier Jul 10 '19 at 07:35
  • 2
    @mathtick one possible solution is to make two different subfolders (eg. 'training' and 'validation'). If you pass the parent folder to tensorboard you'll get a run for each subfolder on the same plot. – EdoG Nov 25 '19 at 14:44
  • Why this doesn't work hen using graph execution with `@tf.function`? – AleB Mar 25 '20 at 17:36
  • The commands shown in the example should work fine in graph mode. Maybe something else in your graph is causing the issue. You could look at an example of this [here](https://www.tensorflow.org/tensorboard/migrate). – Djib2011 Mar 25 '20 at 19:39