2

I have recently experimented with Google's seq2seq to set up a small NMT-system. I managed to get everything working, but I am still wondering about the exact difference between the number of epochs and the number of training steps of a model.

If I am not mistaken, one epoch consists of multiple training steps and has passed once your whole training data has been processed once. I do not understand, however, the difference between the two when I look at the documentation in Google's own tutorial on NMT. Note the last line of the following snippet.

export DATA_PATH=

export VOCAB_SOURCE=${DATA_PATH}/vocab.bpe.32000
export VOCAB_TARGET=${DATA_PATH}/vocab.bpe.32000
export TRAIN_SOURCES=${DATA_PATH}/train.tok.clean.bpe.32000.en
export TRAIN_TARGETS=${DATA_PATH}/train.tok.clean.bpe.32000.de
export DEV_SOURCES=${DATA_PATH}/newstest2013.tok.bpe.32000.en
export DEV_TARGETS=${DATA_PATH}/newstest2013.tok.bpe.32000.de

export DEV_TARGETS_REF=${DATA_PATH}/newstest2013.tok.de
export TRAIN_STEPS=1000000

It seems to me as if there is only a way to define the number of training steps and not the number of epochs of your model. Is it possible that there is an overlap in terminology and that it is thus not necessary to define a number of epochs?

Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
milvala
  • 311
  • 2
  • 13
  • You answered your own question. What in the linked tutorial contradicts what you said? We're not going to read all of it. – interjay Apr 10 '17 at 09:53
  • It only enables you to export a certain nr. of training steps, so I was wondering if there is still a need to specify a nr. of epochs as well. – milvala Apr 10 '17 at 09:57
  • An epoch is a fixed number of steps, so defining one defines the other. – interjay Apr 10 '17 at 10:47
  • So, just to be sure I got it: if your training data consists of 200.000 sentences and you set 1.000.000 training steps, you'll end up with the equivalent of 5 epochs? – milvala Apr 10 '17 at 12:03

1 Answers1

12

An epoch consists of going through all your training samples once. And one step/iteration refers to training over a single minibatch. So if you have 1,000,000 training samples and use a batch size of 100, one epoch will be equivalent to 10,000 steps, with 100 samples per step.

A high-level neural network framework may let you set either the number of epochs or total number of training steps. But you can't set them both since one directly determines the value of the other.

interjay
  • 107,303
  • 21
  • 270
  • 254
  • The second paragraph is not right. Number_of_steps_per_epoch does not define number_of_epochs and vice versa. However, Number_of_steps_per_epoch defines batch_size and vice versa... – Nejla Dec 19 '17 at 02:35
  • @Nejla The question was not about the number of steps per epoch. It was about the relationship between the **total** number of steps and the total number of epochs. When the batch size and number of training samples are fixed, one of those determines the other. – interjay Dec 19 '17 at 11:46
  • Well, that's what you mentioned in your answer! Anyway, I suggest you add **total** to your answer as it's misleading... – Nejla Dec 20 '17 at 13:40
  • is this your sentence: "So if you have 1,000,000 training samples and use a batch size of 100, one epoch will be equivalent to 10,000 steps, with 100 samples per step". Then it's not talking about total. It's about one epoch. So, buddy, just add total and eliminate the confusion. You talk about one epoch in the first paragraph and then your second paragraph refers to the total... It's clear in your mind. But, not the reader... – Nejla Dec 20 '17 at 13:43
  • @Nejla If one epoch is equivalent to 10000 steps as in that example, then N epochs are equivalent to 10000N total steps. Basic math. – interjay Dec 20 '17 at 13:56
  • Indeed and yes it's a basic math! What I'm saying is the term "step" in your first paragraph talks about one epoch and then you use the same term "step" in your second paragraph but this time by "step" you mean "total steps" not "steps per epoch" and that's misleading. Your first paragraph is perfect. But, there is a problem with your second paragraph. I'm saying what you've written is misleading. Change " the number of epoch or training steps" in your second paragraph to " the number of epoch or **total** training steps" and you eliminate the confusion. – Nejla Dec 20 '17 at 14:12
  • @Nejla I'd say that "number of X" would by default refer to the total number of X unless specified otherwise. But I edited my answer to say total. – interjay Dec 20 '17 at 14:20
  • Great. Thanks for the fix. – Nejla Dec 20 '17 at 14:22