-1

I am actually working with doc2vec from gensim library and I want to get all similarities with probabilites not only the top 10 similarities provided by model.docvecs.most_similar()

Once my model is trained

In [1]: print(model)
Out [1]: Doc2vec(...)

If I use model.docvecs.most_similar() I get only the Top 10 similar docs

In [2]: model.docvecs.most_similar('1')
Out [2]: [('2007', 0.9171321988105774),
 ('606', 0.5638039708137512),
 ('2578', 0.530228853225708),
 ('4506', 0.5193327069282532),
 ('2550', 0.5178008675575256),
 ('4620', 0.5098666548728943),
 ('1296', 0.5071642994880676),
 ('3943', 0.5070815086364746),
 ('438', 0.5057751536369324),
 ('1922', 0.5048809051513672)]

And I am looking to get all probilities not only the top 10 for some analysis.

Thanks for your help :)

Oussama Jabri
  • 674
  • 1
  • 7
  • 18

1 Answers1

1

most_similar() takes an optional topn parameter, with a default value of 10, meaning just the top 10 results will be returned.

If you supply another integer, such as the total number of doc-vectors known to the model, then that many sorted results will be provided.

(You can also supply Python None, which returns all similarities unsorted, in the same order as the vectors are stored in the model.)

Note these values are cosine similarities, with a range of values from -1.0 to 1.0, not 'probabilities'.

gojomo
  • 52,260
  • 14
  • 86
  • 115