Optimizing word2vec model comparisons

Question

I have a word2vec model for every user, so I understand what two words look like on different models. Is there a more optimized way to compare the trained models than this?

userAvec = Word2Vec.load(userAvec.w2v)  
userBvec = Word2Vec.load(userBvec.w2v)  

#for word in vocab, perform dot product:

cosine_similarity = np.dot(userAvec['president'], userBvec['president'])/(np.linalg.norm(userAvec['president'])* np.linalg.norm(userBvec['president']))

Is this the best way to compare two models? Is there a stronger way to see how two models compare rather than word by word? Picture 1000 users/models, each with similar number of words in the vocab.

It doesn't make sense to compare two model with direct cosine similarity. — Mehdi, Aug 20 '17 at 08:17

score 1 · Answer 1 · answered Aug 20 '17 at 17:22

There's a faulty assumption at the heart of your question.

If the models userAvec and userBvec were trained in separate sessions, on separate data, the calculated angle between the userAvec['president'] and userBvec['president'] is, alone, essentially meaningless. There's randomness in the algorithm initialization, and then in most modes of training – via things like negative-sampling, frequent-word-downsampling, and arbitrary reordering of training examples due to thread-scheduling variability). As a result, even repeated model-training with the exact same corpus and parameters can result in different coordinates for the same words.

It's only the relative distances/directions, among words that were co-trained in the same iterative process, that have significance.

So it might be interesting the compare whether the two model's lists of top-N similar words, for a particular word, are similar. But the raw value of the angle, between the coordinates of the same word in alternate models, isn't a meaningful measure.

Optimizing word2vec model comparisons

1 Answers1