When reading the Doc2Vec documentation of gensim, I get a bit confused about some options. For example, the constructor of Doc2Vec has a parameter iter:
iter (int) – Number of iterations (epochs) over the corpus.
Why does the train method then also have a similar parameter called epochs?
epochs (int) – Number of iterations (epochs) over the corpus.
What is the difference between both? There's one more paragraph on it in the docs:
To avoid common mistakes around the model’s ability to do multiple training passes itself, an explicit epochs argument MUST be provided. In the common and recommended case, where train() is only called once, the model’s cached iter value should be supplied as epochs value.
But I do not really understand why the constructor needs a iter parameter and what exactly should be provided for it.
EDIT:
I just saw that there is also the possibility to specify the corpus directly in the constructor rather than calling train() separately. So I think in this case, iter would be used and otherwise epochs. Is that correct?
If so, what is the difference between specifying the corpus in the constructor and calling train() manually? Why would one choose the one or other?
EDIT 2:
Although not mentioned in the docs, iter is now depreciated as parameter of Doc2Vec. It was renamed to epochs to be consistent with the parameter of train(). Training seems to work with that, although I struggle with MemoryErrors.