Where is word2vec mapping coming from for DBOW doc2vec in gensim implementation?

Question

I am trying to use gensim for doc2vec and word2vec.

Since PV-DM approach can generate word2vec and doc2vec at the same time, I thought PV-DM is the right model to use.

So, I created a model using gensim by specifying dm=1 for PV-DM

My questions are followings:

Is it true that word2vec model gets trained along with doc2vec when I call train on Doc2vec object??
it seems like property wv contains word2vec and available even before training. Is this static version of word2vec?
I also created DBOW model and noticed that it also contains wv. Is this also the same static version of word2vec that I mentioned in the previous question?

score 2 · Accepted Answer · answered Jun 06 '19 at 23:41

(1) Yes, word-vectors are trained simultaneously with doc-vectors in PV-DM mode.

(2) The contents of the wv property before training happens are the randomly-initialized, untrained word-vectors. (As in word2vec, all vectors get random, low-magnitude starting positions.)

(3) In plain PV-DBOW mode (dm=0), because of code-sharing, the wv vectors are still allocated & initialized – but never trained. At the end of PV-DBOW training, the wv word-vectors will be unchanged, and thus random/useless. (They don't participate in training at all.)

If you enable the optional dbow_words=1 parameter, then skip-gram word-vector training will be mixed-in with plain PV-DBOW training. This will be done in an interleaved fashion, so each target word (to be predicted) will be used to train a PV-DBOW doc-vector, then neighboring context word-vectors. As a result, the wv word-vectors will be trained, and in the "same space" for meaningful comparisons to doc-vectors.

With this option, training will take longer than in plain PV-DBOW (by a factor related to the window size). For any particular end-purpose, the doc-vectors in this mode might be better (if the word-to-word predictions effectively helped to extend the corpus in useful ways) or worse (if the model spending so much effort on word-to-word predictions effectively diluted/overwhelmed other patterns in the full-doc doc-to-word predictions).

Where is word2vec mapping coming from for DBOW doc2vec in gensim implementation?

1 Answers1