A few notes:
That 'accuracy()' test is only a test of word-vectors on analogy problems – an easy evaluation to run, used in a number of papers, but not the final authority on whether a set of word-vectors is better than others for a particular purpose. (When I've had a project-specific scoring method, sometimes the word-vectors that score best on project-specific goals don't score best on those analogies – especially if the word-vectors are being used for a classification or information-retrieval task.)
Further, the popular and fast PV-DBOW Doc2Vec
mode (dm=0
in gensim) doesn't train word-vectors at all, unless you add another setting (dbow_words=1
). Such untrained word-vectors will be in random locations, scoring awfully on the analogies-accuracy.
But, using either PV-DM (dm=1
) mode, or adding dbow_words=1
to PV-DBOW, will get word-vectors from Doc2Vec
, and you might still want to run the analogies test. Fortunately, analogy-evaluation options have been retained & even expanded on the KeyedVectors
object that's held in the Doc2Vec
wv
property. You can call the old accuracy()
method there:
https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.Word2VecKeyedVectors.accuracy
But there's also a slightly-different scoring evaluate_word_pairs()
:
https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.evaluate_word_pairs
(And in the 4.0.0 release there'll be a [evaluate_word_analogies()][1]
which replaces `accuracy().)