0

I'm trying to understand the PV-DM implementation with averaging in gensim. In the function train_document_dm in doc2vec.py the return value ("errors") of train_cbow_pair is in the case of averaging (cbow_mean=1) not divided by the number of input vectors (count). According to this explanation there should be a division by the number of documents in the case of averaging the input vectors: word2vec Parameter Learning Explained, equation (23). Here is the code from train_document_dm:

l1 = np_sum(word_vectors[word2_indexes], axis=0)+np_sum(doctag_vectors[doctag_indexes], axis=0)  
count = len(word2_indexes) + len(doctag_indexes)  
if model.cbow_mean and count > 1:  
    l1 /= count  
neu1e = train_cbow_pair(model, word, word2_indexes, l1, alpha,
                                learn_vectors=False,  learn_hidden=learn_hidden)  
if not model.cbow_mean and count > 1:  
    neu1e /= count  
if learn_doctags:  
    for i in doctag_indexes:  
        doctag_vectors[i] += neu1e * doctag_locks[i]  
if learn_words:  
    for i in word2_indexes:  
        word_vectors[i] += neu1e * word_locks[i]  
саша
  • 521
  • 5
  • 20

1 Answers1

0

Let's say V is defined as the average of A, B, and C:

V = (A + B + C) / 3

Let's set A = 5, B = 6, and C = 10. And let's say we want V equal 10.

We run the calculation (forward propagation), and the value of V, the average of the three numbers, is 7. Thus the correction needed for V is +3.

To apply this correction to A, B, and C, do we also divide that correction by 3, to get +1 against each? In that case A = 6, B = 7, and C = 11 – and now V is just 8. It'd still need another +2 to match the target.

So, no. The proper correction to all of the components of V, in the case where V is an average, is the same as the correction to V – in this case, +3. If we were apply that, we'd reach our proper target value of 10:

A = 8, B = 9, C = 13
V = (8 + 9 + 13) / 3 = 10

The same thing is happening the gensim backpropagation. In the case of averaging, the full corrective value (times the learning-rate alpha) is applied to each of the constituent vectors.

(If using a sum-of-vectors to create V instead, then the error would need to be divided by the count of constituent vectors - to split the error over them all, and not apply it redundantly.)

gojomo
  • 52,260
  • 14
  • 86
  • 115