How to accumulate summary statistics in tensorflow

Question

I'm collecting a set of summary statistics in tensorflow per batch.

I want to collect the same summary statistics computed over a test set, but the test set is too large to process in one batch.

Is there a convenient way for me to compute the same summary statistics as I iterate through the test set?

possible duplicate? https://stackoverflow.com/questions/40788785/how-to-average-summaries-over-multiple-batches/ — Maikefer, Nov 21 '17 at 20:18
That is a duplicate question there, but the accepted answer doesn't mention the streaming mean package, which has now moved to `tf.metrics`, there is a newer answer on that question that does mention it though. — David Parks, Nov 21 '17 at 20:20

David Parks · Accepted Answer · 2020-05-22T21:18:06.633

4

Looks like it was added recently. I found just this in contrib (and later mainline code), streaming metric evaluation.

Keras version (TF 2.0+): https://www.tensorflow.org/api_docs/python/tf/keras/metrics

TF 1.x version: https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/metrics

(link updated based on comments)

edited May 22 '20 at 21:18

answered Apr 07 '17 at 04:20

David Parks

30,789
47
185
328

1

And months after the original post some parts moved out of contrib: https://www.tensorflow.org/api_docs/python/tf/metrics/mean (for reference for people that end up on this page) – dim_tz Oct 27 '17 at 20:54
1

This link appears to be broken as of May 2020. Maybe replaced with [tf.keras.metrics.Mean](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Mean)? – Engineero May 22 '20 at 14:20
Updated links (again). – David Parks May 22 '20 at 21:18

score 1 · Answer 2 · answered Apr 07 '17 at 12:11

Another possibility is to accumulate the summary over the test batches outside of tensorflow and have a dummy variable in the graph to which you can then assign the result of the accumulation. As an example: say you compute the loss on the validation set over several batches and want to have a summary of the mean. You could do achieve this in the following way:

with tf.name_scope('valid_loss'):
    v_loss = tf.Variable(tf.constant(0.0), trainable=False)
    self.v_loss_pl = tf.placeholder(tf.float32, shape=[], name='v_loss_pl')
    self.update_v_loss = tf.assign(v_loss, self.v_loss_pl, name='update_v_loss')

with tf.name_scope('valid_summaries'):
    v_loss_s = tf.summary.scalar('validation_loss', v_loss)
    self.valid_summaries = tf.summary.merge([v_loss_s], name='valid_summaries')

Then at evaluation time:

total_loss = 0.0
for batch in all_batches:
    loss, _ = sess.run([get_loss, ...], feed_dict={...})
    total_loss += loss
total_loss /= float(n_batches)

[_, v_summary_str] = sess.run([self.update_v_loss, self.valid_summaries],
                              feed_dict={self.v_loss_pl: total_loss})
writer.add_summary(v_summary_str)

While this gets the job done, it admittedly feels a bit hacky. That streaming metric evaluation from contrib you posted might be much more elegant - I've never come across it actually, so curious to check it out.

How to accumulate summary statistics in tensorflow

2 Answers2

Linked