Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)
Questions tagged [doc2vec]
556 questions
3
votes
1 answer
Document similarity in production environment
We are having n number of documents. Upon submission of new document by user, our goal is to inform him about possible duplication of existing document (just like stackoverflow suggests questions may already have answer).
In our system, new document…

user2578525
- 191
- 1
- 11
3
votes
1 answer
Gensim Doc2Vec most_similar() method not working as expected
I am struggling with Doc2Vec and I cannot see what I am doing wrong.
I have a text file with sentences. I want to know, for a given sentence, what is the closest sentence we can find in that file.
Here is the code for model creation:
sentences =…

Yann Droy
- 177
- 1
- 2
- 9
3
votes
1 answer
Gensim's Doc2vec - inferred vector isn't similar
When I train Doc2vec (using Gensim's Doc2vec in Python) on corpus of about 10k documents (each has few hundred words) and then infer document vectors using the same documents, they are not at all similar to the trained document vectors. I would…

awa993
- 177
- 2
- 14
3
votes
0 answers
Embedding Gensim Doc2Vec Tensorboard
I have a set of documents in a df. I am transforming those documents to vectors with gensim Doc2Vec:
def read_corpus(documents):
for i, plot in enumerate(documents):
yield…

OverflowingTheGlass
- 2,324
- 1
- 27
- 75
3
votes
1 answer
how to use build_vocab in gensim?
Build_vocab extend my old vocabulary?
For example, my idea is when I use doc2vec(s) to train a model, it just builds the vocabulary from the datasets. If I want to extend it, I need to use build_vocab()
Where should I use it? Should I put it…

Cherrymelon
- 412
- 2
- 7
- 17
3
votes
1 answer
Updating training documents for gensim Doc2Vec model
I have an existing gensim Doc2Vec model, and I'm trying to do iterative updates to the training set, and by extension, the model.
I take the new documents, and perform preproecssing as normal:
stoplist =…

Brian O'Halloran
- 323
- 3
- 18
3
votes
1 answer
What are doc2vec training iterations?
I am new to doc2vec. I was initially trying to understand doc2vec and mentioned below is my code that uses Gensim. As I want I get a trained model and document vectors for the two documents.
However, I would like to know the benefits of retraining…
user8566323
3
votes
2 answers
User2Vec? representing a user based on the docs they consume
I'd like to form a representation of users based on the last N documents they have liked.
So i'm planning on using doc2vec to form this representation of each document but i'm just trying to figure out what would be a good way to essentially place…

andrewm4894
- 1,451
- 4
- 17
- 37
3
votes
1 answer
Why are almost all cosine similarities positive between word or document vectors in gensim doc2vec?
I have calculated document similarities using Doc2Vec.docvecs.similarity() in gensim. Now, I would either expect the cosine similarities to lie in the range [0.0, 1.0] if gensim used the absolute value of the cosine as the similarity metric, or…

Sami Liedes
- 1,084
- 8
- 19
3
votes
2 answers
Gensim docvecs.most_similar returns Id's that dont exist
I'm trying create an algorithm that's capable of show the top n documents similar to a specific document.
For that i used the gensim doc2vec. The code is bellow:
model = gensim.models.doc2vec.Doc2Vec(size=400, window=8, min_count=5, workers = 11,…

JoaoSilva
- 63
- 7
3
votes
2 answers
load pre-trained word2vec model for doc2vec
I'm using gensim to extract feature vector from a document.
I've downloaded the pre-trained model from Google named GoogleNews-vectors-negative300.bin and I loaded that model using the following command:
model =…

lenhhoxung
- 2,530
- 2
- 30
- 61
3
votes
1 answer
Doc2Vec model Python 3 compatibility
I trained a doc2vec model with Python2 and I would like to use it in Python3.
When I try to load it in Python 3, I get :
Doc2Vec.load('my_doc2vec.pkl')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in…

Bernard
- 301
- 2
- 6
3
votes
0 answers
Is there any way to validate the performance of a Doc2Vec/ Word2Vec Deep Learning model?
I am working with the Doc2Vec and Word2Vec deep learning algorithms (Doc2Vec API description from Gensim). More description here
Currently I am interested in using the model.n_similarity(wordSet1, wordSet2) method which basically computes the …

Uther Pendragon
- 302
- 2
- 14
2
votes
2 answers
doc2vec infer words from vectors
I am clustering comments.
After preprocessing and a vectorization of a text, I have inferred vectors from my doc2vec model and applied kmeans.
After that I want to convert cluster centroid vectors to words to kinda look at the semantic cores of the…

frogseer
- 39
- 6
2
votes
1 answer
Run model that need gensim older vesion
I need to run a model but it needs older version of gensim with DocvecsArray attribute.How can i run it?
AttributeError: Can't get attribute 'DocvecsArray' on

Ayshath Thasmiya
- 21
- 4