2

I am trying to use gensim for doc2vec and word2vec.

Since PV-DM approach can generate word2vec and doc2vec at the same time, I thought PV-DM is the right model to use.

So, I created a model using gensim by specifying dm=1 for PV-DM

My questions are followings:

  1. Is it true that word2vec model gets trained along with doc2vec when I call train on Doc2vec object??

  2. it seems like property wv contains word2vec and available even before training. Is this static version of word2vec?

  3. I also created DBOW model and noticed that it also contains wv. Is this also the same static version of word2vec that I mentioned in the previous question?

Brandon Lee
  • 695
  • 1
  • 10
  • 22

1 Answers1

2

(1) Yes, word-vectors are trained simultaneously with doc-vectors in PV-DM mode.

(2) The contents of the wv property before training happens are the randomly-initialized, untrained word-vectors. (As in word2vec, all vectors get random, low-magnitude starting positions.)

(3) In plain PV-DBOW mode (dm=0), because of code-sharing, the wv vectors are still allocated & initialized – but never trained. At the end of PV-DBOW training, the wv word-vectors will be unchanged, and thus random/useless. (They don't participate in training at all.)

If you enable the optional dbow_words=1 parameter, then skip-gram word-vector training will be mixed-in with plain PV-DBOW training. This will be done in an interleaved fashion, so each target word (to be predicted) will be used to train a PV-DBOW doc-vector, then neighboring context word-vectors. As a result, the wv word-vectors will be trained, and in the "same space" for meaningful comparisons to doc-vectors.

With this option, training will take longer than in plain PV-DBOW (by a factor related to the window size). For any particular end-purpose, the doc-vectors in this mode might be better (if the word-to-word predictions effectively helped to extend the corpus in useful ways) or worse (if the model spending so much effort on word-to-word predictions effectively diluted/overwhelmed other patterns in the full-doc doc-to-word predictions).

gojomo
  • 52,260
  • 14
  • 86
  • 115