How to control when to compute evaluation vs training using the Estimator API of tensorflow?

Question

The tensorflow documentation does not provide any example of how to perform a periodic evaluation of the model on an evaluation set

The accepted answer suggested the use of Experiment (which is deprecated according to this README).

All I found on online points towards using the train_and_evaluate method. However, I still do not see how to switch between the two processes (train and evaluate). I have tried the following:

estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    params=hparams,
    model_dir=model_dir,
    config = tf.estimator.RunConfig(
        save_checkpoints_steps = 2000,
        save_summary_steps = 100,
        keep_checkpoint_max=5
    )
)

train_input_fn = lambda: input_fn(
    train_file, #a .tfrecords file
    train=True,
    batch_size=70,
    num_epochs=100
)

eval_input_fn = lambda: input_fn(
    val_file, # another .tfrecords file
    train=False,
    batch_size=70,
    num_epochs=1
)
train_spec = tf.estimator.TrainSpec(
    train_input_fn,
    max_steps=125
)    

eval_spec = tf.estimator.EvalSpec(
    eval_input_fn,
    steps=30,
    name='validation',
    start_delay_secs=150,
    throttle_secs=200
)

tf.logging.info("start experiment...")
tf.estimator.train_and_evaluate(
    estimator,
    train_spec,
    eval_spec
)

Here is what I think my code should be doing:

Train the model for 100 epochs using a batch size of 70; save checkpoints every 2000 batches; save summaries every 100 batches; keep at most 5 checkpoints; after 150 batches on the training set, compute the validation error using 30 batches of validation data

However, I get the following logs:

INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /output/model.ckpt.
INFO:tensorflow:loss = 39.55082, step = 1
INFO:tensorflow:global_step/sec: 178.622
INFO:tensorflow:loss = 1.0455043, step = 101 (0.560 sec)
INFO:tensorflow:Saving checkpoints for 150 into /output/model.ckpt.
INFO:tensorflow:Loss for final step: 0.8327793.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-04-02-22:49:15
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /projects/MNIST-GCP/output/model.ckpt-150
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [3/30]
INFO:tensorflow:Evaluation [6/30]
INFO:tensorflow:Evaluation [9/30]
INFO:tensorflow:Evaluation [12/30]
INFO:tensorflow:Evaluation [15/30]
INFO:tensorflow:Evaluation [18/30]
INFO:tensorflow:Evaluation [21/30]
INFO:tensorflow:Evaluation [24/30]
INFO:tensorflow:Evaluation [27/30]
INFO:tensorflow:Evaluation [30/30]
INFO:tensorflow:Finished evaluation at 2018-04-02-22:49:15
INFO:tensorflow:Saving dict for global step 150: accuracy = 0.8552381, global_step =150, loss = 0.95031387

From the logs, it seems that the training stops after the first evaluation step. What am I missing from the documentation? Could you explain me how I should have implemented what I think my code is doing?

Additional info I am running everything using the MNIST dataset, which has 50,000 images in the training set, so (I think) the model should run for *num_epochs*50,000/batch_size ≃ 7,000 steps*

I sincerely appreciate your help!

EDIT: after running experiments I realize that max_steps controls the number of steps of the whole training procedure, not just the amount of steps before computing the metrics on the test set. Reading tf.estimator.Estimator.train, I see it has a steps argument, which works incrementally and is bounded by max_steps; however, tf.estimator.TrainSpec does not have the steps argument, which means I cannot control the number of steps to take before computing metrics on the validation set.

score 4 · Answer 1 · answered Jan 28 '19 at 23:03

From my understanding, evaluation happens using a respawned model from the latest checkpoint. In your case, you don't save a checkpoint until 2000 steps. You also indicate max_steps=125, which will take precedence over the data set you feed your model.

Therefore, even though you indicate batch size of 70 and 100 epochs, your model has stopped training at 125 steps, which is well below the checkpoint limit of 2000 steps, which in turn limits evaluation, because evaluation depends on the checkpoint model.

Note by default, evaluation happens with every checkpoint save, assuming you don't set a throttle_secs limit.

score 2 · Answer 2 · answered Apr 03 '18 at 12:24

In fact each 200 secs or when your training finished, the estimator will switch from the training phase to the evaluation one.

However, we can see in your code that your are able to achieve the 125 steps before the evaluation started, it means that your training finished. The max_steps is the number of time your training will be repeated before stopping, there are any link with the number of epochs (cause it is not using in tf.estimator.train_and_evaluate). And during your training your evaluation metrics will occure each throttle_secs (=200 here).

About the metrics you can add these inside your model with :

predict = tf.nn.softmax(logits, name="softmax_tensor")
classes = tf.cast(tf.argmax(predict, 1), tf.uint8)

def conv_model_eval_metrics(classes, labels, mode):
    if mode == tf.estimator.ModeKeys.TRAIN or mode == tf.estimator.ModeKeys.EVAL:
        return {
            'accuracy': tf.metrics.accuracy(classes, labels),
            'precision': tf.metrics.precision(classes, labels),
            'recall': tf.metrics.recall(classes, labels),
        }
    else:
        return None

eval_metrics = conv_model_eval_metrics(classes, labels, mode)
with tf.variable_scope("performance_metrics"):
    #Accuracy is the most intuitive performance measure and it is simply a
        #ratio of correctly predicted observation to the total observations.
    tf.summary.scalar('accuracy', eval_metrics['accuracy'][1])

    #How many selected items are relevant
    #Precision is the ratio of correctly predicted positive observations to
        #the total predicted positive observations.
    tf.summary.scalar('precision', eval_metrics['precision'][1])

    #How many relevant items are selected
    #Recall is the ratio of correctly predicted positive observations to
        #the all observations in actual class
    tf.summary.scalar('recall', eval_metrics['recall'][1])

It is working pretty well to follow on tensorboard the precision, recall and accuracy during your training and evaluation.

PS : Sorry, it is my first answer, that's why it is quite disgusting to read it ^^

Thank you for your answer! Although it is useful, it does not answer the question. I will post what I think is the answer from some experiments I've run — srcolinas, Apr 04 '18 at 15:34

score 1 · Answer 3 · answered Apr 04 '18 at 15:44

One can control the repetitions by the tf.data.Dataset.repeat(num_epochs) one sets in the input_fn(). The training function will run until the number of epochs is consumed, then the evaluation function will run, then the training function will run again until the number of epochs, and so on; finally, the train_and_eval method will stop when the max_steps define in TrainSpec is reached.

This is a conclusion I draw from a few experiments, corrections are welcome.

How to control when to compute evaluation vs training using the Estimator API of tensorflow?

3 Answers3

Linked