0

I'm completely confused with the meaning of epochs, and steps. I also read the issue What is the difference between steps and epochs in TensorFlow?, But I'm not sure about the answer. Consider this part of code:

EVAL_EVERY_N_STEPS = 100
MAX_STEPS = 10000

nn = tf.estimator.Estimator(
        model_fn=model_fn,
        model_dir=args.model_path,
        params={"learning_rate": 0.001},
        config=tf.estimator.RunConfig())

for _ in range(MAX_STEPS // EVAL_EVERY_N_STEPS):
        print(_)

        nn.train(input_fn=train_input_fn,
                 hooks=[train_qinit_hook, step_cnt_hook],
                 steps=EVAL_EVERY_N_STEPS)

        if args.run_validation:
            results_val = nn.evaluate(input_fn=val_input_fn,
                                      hooks=[val_qinit_hook, 
                                      val_summary_hook],
                                      steps=EVAL_STEPS)

            print('Step = {}; val loss = {:.5f};'.format(
                results_val['global_step'],
                results_val['loss']))
end

Also, the number of training samples is 400. I consider the MAX_STEPS // EVAL_EVERY_N_STEPS equal to epochs (or iterations). Indeed, the number of epochs is 100. What does the steps mean in nn.train?

nastaran
  • 132
  • 1
  • 12
  • 1
    You don't need for loop with estimator. It handles iteration just as Keras. And please read TensorFlow tutorials carefully – Sharky Apr 10 '19 at 09:28
  • This code is a part of DLTK toolkit, in fact. The for _ in range(MAX_STEPS // EVAL_EVERY_N_STEPS): looks to specify the number of epochs, I think. – nastaran Apr 13 '19 at 07:58

1 Answers1

3

In Deep Learning:

  • an epoch means one pass over the entire training set.
  • a step or iteration corresponds to one forward pass and one backward pass.

If your dataset is not divided and passed as is to your algorithm, each step corresponds to one epoch, but usually, a training set is divided into N mini-batches. Then, each step goes through one batch and you need N steps to complete a full epoch.

Here, if batch_size == 4 then 100 steps are indeed equal to one epoch.

epochs = batch_size * steps // n_training_samples

Olivier Dehaene
  • 1,620
  • 11
  • 15