0

I am using Distributed Bag of Words (DBOW) and I'm curious what happens during a single Epoch? Does DBOW cycle through all documents (aka Batch) or does it cycle through a subset of documents (aka Mini-batch)? In addition, for a given document DBOW will randomly sample a word from a text window and learn the weights to associate that target word to the surrounding words in the window, does this mean that DBOW may not go through all text in a document?

I've gone through the GENSIM (https://github.com/RaRe-Technologies/gensim) code to identify if there is a parameter for batch, but no luck.

Compo
  • 36,585
  • 5
  • 27
  • 39
chethanjjj
  • 53
  • 5

1 Answers1

0

One epoch of PV-DBOW training in gensim Doc2Vec will iterate through all the texts, and then for each text iterate through all their words, attempting to predict each word in turn, and then back-propagating corrections for that predicted word immediately. That is, there's no "mini-batching" at all: each target word is an individual prediction/back-propagation.

(There is a sort-of-batching in how groups-of-texts are sent to worker threads, which can change the ordering somewhat, but each individual training-example presented to the neural-network is corrected individually, so no SGD-mini-batching is occurring.)

The words of each text are considered in order, and only skipped if (a) the word appeared fewer than min_count times; (b) the word is very-frequent and is chosen for random dropping via the value of the sample parameter. So you can generally think of the training as including all significant words of every document.

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • Hmm interesting. So at each epoch, DBOW runs through all the words across the documents. Do you have a reference I could read more about this training? I was reading through the "Distributed Representations of Sentences and Documents" paper, but there was no explicit mention of what happens during an epoch. – chethanjjj Jun 14 '19 at 17:28
  • As the 'Paragraph Vector' paper says, "the only change in this model compared to the word vector framework is" one particular equation, it may help to read original word2vec papers, or other expositions. 'Paragraph vector'-style doc-vectors really just add a synthetic extra 'word' for each text that 'floats' into every word-prediction context, regardless of its distance from the target. The followup Dai/Olah/Le paper "Document Embedding with Paragraph Vectors" may be interesting. But generally, the original paper & the full source of an implementation like gensim's is the best reference. – gojomo Jun 15 '19 at 15:06