Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)
Questions tagged [doc2vec]
556 questions
0
votes
1 answer
'Doc2Vec' object has no attribute 'syn0'
import gensim
from gensim.models.doc2vec import TaggedDocument
taggeddocs = []
tag2tweetmap = {}
for index,i in enumerate(cleaned_tweets):
if len(i) > 2: # Non empty tweets
tag = u'SENT_{:d}'.format(index)
sentence =…

Dylan
- 11
- 2
0
votes
1 answer
Can i build vocaburay in twice with gensim word2vec or doc2vec?
I have two different corpus and what i want is to train the model with both and to do it it I thought that it could be something like this:
model.build_vocab(sentencesCorpus1)
model.build_vocab(sentencesCorpus2)
Would it be right?

Mikel Laburu
- 157
- 1
- 12
0
votes
1 answer
add new vocabulary to existing Doc2vec model
I Already have a Doc2Vec model. I have trained it with my train data.
Now after a while I want to use Doc2Vec for my test data. I want to add my test data vocabulary to my existing model's vocabulary. How can I do this?
I mean how can I update my…
0
votes
1 answer
Gensim Doc2Vec Most_Similar
I'm having trouble with the most_similar method in Gensim's Doc2Vec model. When I run most_similar I only get the similarity of the first 10 tagged documents (based on their tags-always from 0-9). For this code I have topn=5, but I've used…

J. Collins
- 83
- 2
- 8
0
votes
1 answer
Does Mikolov 2014 Paragraph2Vec models assume sentence ordering?
In Mikolov 2014 paper regarding paragraph2Vectors, https://arxiv.org/pdf/1405.4053v2.pdf, do the authors assume in both PV-DM and PV-DBOW, the ordering of sentences need to make sense?
Imagine I am handling a stream of tweets, and each tweet is a…

Franklin Dong
- 183
- 1
- 1
- 10
0
votes
1 answer
Concatenating two doc2vec models: Vector dimensions doubled
I have a question regarding concatenating two doc2vec models. I followed the official gensim IMDB example on doc2vec and implemented example data.
When concatenating two models (PV-DM + PV-DBOW), as outlined in the original paper, I wondered that…

Christopher
- 2,120
- 7
- 31
- 58
0
votes
1 answer
load Doc2Vec model and get new sentence's vectors for test
I have read lots of examples regarding doc2vec, but I couldn't find any answer. Like a real example, I want to build a model with doc2vec and then train it with some ML models. after that, how can I get the vector of a raw string with the exact…
0
votes
2 answers
ELKI Kmeans clustering Task failed error for high dimensional data
I have a 60000 documents which i processed in gensim and got a 60000*300 matrix. I exported this as a csv file. When i import this in ELKI environment and run Kmeans clustering, i am getting below error.
Task…

StatguyUser
- 2,595
- 2
- 22
- 45
0
votes
1 answer
gensim: Retrieving word frequency in doc2vec vocabulary
I just came across this StackOverflow post on word counts in a doc2vec model vocabulary. I wonder if there is another method to retrieve the word frequency, other than
for word, vocab_obj in model.wv.vocab.items():
print(str(word) +…

Christopher
- 2,120
- 7
- 31
- 58
0
votes
1 answer
Are the document vectors used in doc2vec one-hot?
I understand conceptually how word2vec and doc2vec work, but am struggling with the nuts and bolts of how the numbers in the vectors get processed algorithmically.
If the vectors for three context words are: [1000], [0100], [0010]
and the vector for…

mudstick
- 99
- 5
0
votes
1 answer
doc2vec: any way to fetch closest matching terms for a given vector?
The use-case I have is to have a collection of "upvoted" documents and "downvoted" documents and using those to re-order a set of results in a search.
I am using gensim doc2vec and am able to run the most_similar queries for word(s) and fetch…

Santino
- 776
- 2
- 11
- 29
0
votes
1 answer
Which way to recover doc2vec model more efficient?
After I train a doc2vec model, I want to reuse the document vectors in another module. It seems there are two ways to implement this: save the model and save doc-vectors as a dictionary.
I just wonder which one is more memory-efficient and which one…

YangGuo
- 33
- 5
0
votes
1 answer
Parameter values of Doc2vec for Document Tagging - Gensim
my task is to assign tags (descriptive words) to documents or posts from the list of available tags. I'm working with Doc2vec available in Gensim. I read that doc2vec can be used for document tagging. But i could not get the suitable parameter…

Rabia
- 1
- 1
0
votes
1 answer
getting vector-tags pair after training in words2vec
I am trying to convert a bunch of poems into vectors, and then use my own implementation of k-means on them, but I can't figure out how to get the vectors with tags attached after training in doc2vec. I also find that when I train on 11 files I get…

tharvey
- 57
- 1
- 10
0
votes
1 answer
Issues in doc2vec tags in Gensim
I am using gensim doc2vec as below.
from gensim.models import doc2vec
from collections import namedtuple
import re
my_d = {'recipe__001__1': 'recipe 1 details should come here',
'recipe__001__2': 'Ingredients of recipe 2 need to be added'}
docs =…
user8566323