3

I am using the train_and_evaluate function in tensorflow and want to make the eval step happen more frequently (either by the global step or time elapsed). This is my code (model function is not shown).

def get_classifier(batch_size):
    config = tf.estimator.RunConfig(
        model_dir="models/shape_model_cnn_3",
        save_checkpoints_secs=300,
        save_summary_steps=100)

    params = tf.contrib.training.HParams(
        batch_size=batch_size,
        num_conv=[48,64,96], # Sizes of each convolutional layer
        conv_len=[2,3,4], # Kernel size of each convolutional layer
        num_nodes=128, # Number of LSTM nodes for each LSTM layer
        num_layers=3, # Number of LSTM layers
        num_classes=7, # Number of classes in final layer
        learning_rate=0.0001,
        gradient_clipping_norm=9.0,
        dropout=0.3)

    classifier = tf.estimator.Estimator(
        model_fn=my_model,
        config=config,
        params=params
    )

    return classifier

classifier = get_classifier(8)

train_spec = tf.estimator.TrainSpec(
    input_fn=lambda:input.batch_dataset("dataset/shape-train-???.tfrecords", tf.estimator.ModeKeys.TRAIN, 8),
    max_steps=100000
)

eval_spec = tf.estimator.EvalSpec(
    input_fn=lambda:input.batch_dataset("dataset/shape-eval-???.tfrecords", tf.estimator.ModeKeys.EVAL, 8)
)

tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)

I have tried using the start_delay_secs parameter in my EvalSpec, im not sure if this is what it is for but it doesn't seem to have any effect anyway

BenJacob
  • 957
  • 10
  • 31

4 Answers4

1

When I set save_checkpoints_steps, it does run evaluation after the specified number of steps; The configuration :

tf.estimator.RunConfig(save_summary_steps=5, log_step_count_steps=3, save_checkpoints_steps=40)

gives an evaluation each 40 steps.

ch9lb
  • 51
  • 3
0

You can set max_steps to a lower number in order to evaluate sooner.

This will reset the input function. Currently, there is no way to pause the input function and resume at the same state using estimator. We are looking into adding this feature.

kww
  • 549
  • 3
  • 11
  • Thanks for your response, the max_steps parameter terminates the training, correct? The train and evaluate cycle already evaluates periodically, is there no way to change the frequency of this? – BenJacob Apr 24 '18 at 07:22
  • You can increase the value passed into max_steps. An alternate solution (which might be closer to what you're looking for) is having the training dataset terminate sooner using the `.take()` function. – kww Apr 24 '18 at 22:00
0

I have found that there is a parameter in EvalSpec, `throttle_secs' which starts the evaluation stage after a number of seconds. Alternatively if you want to evaluate based on a number of steps, you can use a for loop and incrementally increase the max_steps as suggested by @Kathy Wu.

BenJacob
  • 957
  • 10
  • 31
0

Use tf.contrib.learn.Experiment instead.

For example:

experiment = tf.contrib.learn.Experiment(

    estimator=estimator,  # Estimator

    train_input_fn=train_input_fn,  # First-class function

    eval_input_fn=eval_input_fn,  # First-class function

    train_steps=params.train_steps,  # Minibatch steps

    min_eval_frequency=params.min_eval_frequency,  # Eval frequency

    train_monitors=[train_input_hook],  # Hooks for training

    eval_hooks=[eval_input_hook],  # Hooks for evaluation

    eval_steps=None  # Use evaluation feeder until its empty

)

learn_runner.run(

    experiment_fn=experiment,  # First-class function

    run_config=run_config,  # RunConfig

    schedule="train_and_evaluate",  # What to run

    hparams=params  # HParams

)
LeckieNi
  • 466
  • 3
  • 10
  • 2
    tf.contrib.learn.Experiment is now deprecated. See: https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment Tensorflow recommends switching to tf.estimator.train_and_evaluate – Hlynur Freyr Jónsson Jan 14 '19 at 13:08