0

I'm training a model (a generative adversarial network) over an input-set using Tensorflow, and I would like to save model's parameters every 50 epochs.

Let say that I want to train the model for 1000 epochs, and save the model's parameters every 50 epoch, which would end up having 20 different checkpoint files.

By having a Session, and a Saver object, I simply use the following code to do so.

if num_epoch % 50 == 0:
    saver.save(sess=sess, path='RGAN-1/sv/' + type_exp, global_step=num_epoch)

The problem is, that checkpoints are getting overwritten, and at the end of the experiment, I only have the last 6 checkpoints, while I should have 20 checkpoints.

I have no idea why this is happening.

Zoe
  • 27,060
  • 21
  • 118
  • 148
Redfox-Codder
  • 183
  • 2
  • 9
  • Possible duplicate of [Tensorflow checkpoint models getting deleted](https://stackoverflow.com/questions/41018454/tensorflow-checkpoint-models-getting-deleted) – Lio Sep 20 '19 at 16:06

1 Answers1

2

tf.train.Saver has a max_to_keep argument that is set to 5 by default. You can pass 0 to keep all checkpoints:

saver = tf.train.Saver(..., max_to_keep=0)

See the docs for a full argument list.

xdurch0
  • 9,905
  • 4
  • 32
  • 38