How to find similarity between two list of strings using doc2vec?

Question

I have a list of strings like below. I would like to see similarity between list1 and list2 using Doc2Vec.

list1 = [['i','love','machine','learning','its','awesome'],['i', 'love', 'coding', 'in', 'python'],['i', 'love', 'building', 'chatbots']]
list2 = ['i', 'love', 'chatbots']

It's not clear at all what you are asking for – ncica May 27 '19 at 13:11 — ncica, May 27 '19 at 13:11

score 0 · Accepted Answer · answered May 27 '19 at 18:28

If you're using the Doc2Vec implementation in the gensim library, there are intro notebooks that cover this. See for example the file doc2vec-lee.ipynb, which is inside the gensim docs/notebooks directory (where you can and sould run it locally), or viewable online at:

https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb

Note that:

you'll need a model trained on far more data - ideally tens-of-thousands or more texts, each text being at least a sentence
if the two texts you want to compare were part of your training set, you can retrieve the learned doc-vectors from the model
if the two texts you want to compare are not part of the training set, you can infer doc-vectors for them, using the model, as is shown in that notebook

How to find similarity between two list of strings using doc2vec?

1 Answers1