Logging training and validation loss in tensorboard

Question

I'm trying to learn how to use tensorflow and tensorboard. I have a test project based on the MNIST neural net tutorial.

In my code, I construct a node that calculates the fraction of digits in a data set that are correctly classified, like this:

correct = tf.nn.in_top_k(self._logits, labels, 1)
correct = tf.to_float(correct)
accuracy = tf.reduce_mean(correct)

Here, self._logitsis the inference part of the graph, and labels is a placeholder that contains the correct labels.

Now, what I would like to do is evaluate the accuracy for both the training set and the validation set as training proceeds. I can do this by running the accuracy node twice, with different feed_dicts:

train_acc = tf.run(accuracy, feed_dict={images : training_set.images, labels : training_set.labels})
valid_acc = tf.run(accuracy, feed_dict={images : validation_set.images, labels : validation_set.labels})

This works as intended. I can print the values, and I can see that initially, the two accuracies will both increase, and eventually the validation accuracy will flatten out while the training accuracy keeps increasing.

However, I would also like to get graphs of these values in tensorboard, and I can not figure out how to do this. If I simply add a scalar_summary to accuracy, the logged values will not distinguish between training set and validation set.

I also tried creating two identical accuracy nodes with different names and running one on the training set and one on the validation set. I then add a scalar_summary to each of these nodes. This does give me two graphs in tensorboard, but instead of one graph showing the training set accuracy and one showing the validation set accuracy, they are both showing identical values that do not match either of the ones printed to the terminal.

I am probably misunderstanding how to solve this problem. What is the recommended way of separately logging the output from a single node for different inputs?

score 55 · Answer 1 · edited May 29 '17 at 05:21

55

There are several different ways you could achieve this, but you're on the right track with creating different tf.summary.scalar() nodes. Since you must explicitly call SummaryWriter.add_summary() each time you want to log a quantity to the event file, the simplest approach is probably to fetch the appropriate summary node each time you want to get the training or validation accuracy:

accuracy = tf.reduce_mean(correct)

training_summary = tf.summary.scalar("training_accuracy", accuracy)
validation_summary = tf.summary.scalar("validation_accuracy", accuracy)


summary_writer = tf.summary.FileWriter(...)

for step in xrange(NUM_STEPS):

  # Perform a training step....

  if step % LOG_PERIOD == 0:

    # To log training accuracy.
    train_acc, train_summ = sess.run(
        [accuracy, training_summary], 
        feed_dict={images : training_set.images, labels : training_set.labels})
    writer.add_summary(train_summ, step) 

    # To log validation accuracy.
    valid_acc, valid_summ = sess.run(
        [accuracy, validation_summary],
        feed_dict={images : validation_set.images, labels : validation_set.labels})
    writer.add_summary(valid_summ, step)

Alternatively, you could create a single summary op whose tag is a tf.placeholder(tf.string, []) and feed the string "training_accuracy" or "validation_accuracy" as appropriate.

edited May 29 '17 at 05:21

abeta201

233
1
14

answered Dec 26 '15 at 19:31

mrry

125,488
26
399
400

7

Thanks! This was exactly what I was looking for! My problem was that I was trying to use a single call to `merge_all_summaries` rather than doing `add_summary` for every summary. The documentation seems to suggest that using `merge_all_summaries` is preferred over individual calls to `add_summary`, but in this case the manual way seems better. – user3468216 Dec 27 '15 at 09:39
7

That's correct: `merge_all_summaries` is a "one size fits all" way to do things, but individual calls to `add_summary` gives you much more control. (For what it's worth, we typically set up separate processes to do training and validation, where the validation task has its own - slightly different - graph and loads in the latest model checkpoint periodically.) – mrry Dec 27 '15 at 23:44
@mrry Is there a tutorial on how to run validation while referring to training checkpoints? – piRSquared Aug 11 '16 at 23:45
10

Is it possible to call merge_all_summaries() first for the other training-related operations, and then use a separate add_summary for validation_set only ? – Vitt Volt Dec 13 '16 at 22:43
Does that mean I can make the placeholder tag, and stick to the `merge_all_summaries()`? – Martijn Courteaux Apr 04 '18 at 13:11
The 'tf.placeholder(tf.string, [])` solution doesn't appear to work in anymore (or at least not in tensorflow==1.13.2). Looks like the tag/name must be a python string now. – mathandy Sep 22 '19 at 21:49

score 5 · Answer 2 · edited Apr 05 '18 at 09:31

5

Another way to do it, is to use a second file writer. So you are able to use the merge_summaries command.

train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
                                      sess.graph)
test_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/test')
tf.global_variables_initializer().run()

Here is the complete documentation. This works for me fine : TensorBoard: Visualizing Learning

edited Apr 05 '18 at 09:31

Prags

2,457
2
21
38

answered Mar 05 '18 at 10:18

stillPatrick

65
1
5

1

Why is the graph not set for `test_writer`? – Ali250 Jun 03 '18 at 13:42
1

Cause you save the graphinformtions in the training set. if you want the graph in the test set as well, you can add it, but its not necessary. – stillPatrick Jun 04 '18 at 14:15

Logging training and validation loss in tensorboard

2 Answers2

Linked