Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)
Questions tagged [doc2vec]
556 questions
2
votes
2 answers
Inaccurate similarities results by doc2vec using gensim library
I am working with Gensim library to train some data files using doc2vec, while trying to test the similarity of one of the files using the method model.docvecs.most_similar("file") , I always get all the results above 91% with almost no difference…

Abdessamad139
- 325
- 4
- 16
2
votes
0 answers
Error while uisng doc2vec
I am trying to generate vectors from a list of sentences.
x1 = 'Today I’d like to start a series of some posts concerning extreme value analysis using R.'
x2 = 'Basically, there are several very useful packages in R which provide methods and…

mommomonthewind
- 4,390
- 11
- 46
- 74
2
votes
1 answer
Gensim doc2vec most_similar equivalent to get full documents
In Gensim's doc2vec implementation, gensim.models.keyedvectors.Doc2VecKeyedVectors.most_similar returns the tags and cosine similarity of the documents most similar to the query document. What if I want the actual documents themselves and not the…

Syncrossus
- 570
- 3
- 17
2
votes
1 answer
doc2vec get most similar document
I am struggling to understand the usage of doc2vec. I trained a toy model on a set of documents using some sample code I saw on googling. Next I want to find the document that the model considers to be the closest match to documents in my training…

Lin Endian
- 93
- 3
- 9
2
votes
1 answer
gensim: 'Doc2Vec' object has no attribute 'intersect_word2vec_format' when I load the Google pre-trained word2vec model
I get this error when I load the google pre-trained word2vec to train doc2vec model with my own data. Here is part of my…

Xizi Wei
- 23
- 1
- 3
2
votes
1 answer
Document classification using word vectors
While I was classifying and clustering the documents written in natural language, I came up with a question ...
As word2vec and glove, and or etc, vectorize the word in distributed spaces, I wonder if there are any method recommended or commonly…

Isaac Sim
- 539
- 1
- 7
- 23
2
votes
0 answers
Gensim worker thread stuck
I am training document embeddings on a ~20 million sentences and using parallel processing in gensim. I'm creating my model and training with the following code
class read_corpus(object):
def __init__(self, fname, n):
self.fname =…

bbrodrigues
- 115
- 8
2
votes
1 answer
How to use vectors from Doc2Vec in Tensorflow
I am trying to use Doc2Vec to convert sentences to vectors, then use those vectors to train a tensorflow classifier.
I am a little confused at what tags are used for, and how to extract all of the document vectors from Doc2Vec after it has finished…

Damian Reiter
- 35
- 3
2
votes
1 answer
gensim model return ids not related with input doc2vec
I created a model from mongodb db news and I tagged the documents by mongo collection id
from gensim.models.doc2vec import TaggedDocument
i=0
docs=[]
for artical in lstcontent:
doct = TaggedDocument(clean_str(artical), [lstids[i]])
…
2
votes
1 answer
human-interpretable, meaningful clusters using doc2vec
I am clustering a set of education documents using doc2vec.
As a human, I think of these as in categories such as:
computer-related
language related
collaboration
arts
etc.
I wonder if there is a way to 'guide' the doc2vec clustering into a set…

user7400474
- 41
- 3
2
votes
1 answer
How to properly tag a list of documenta by Gensim TaggedDocument()
I would like to tag a list of documents by Gensim TaggedDocument(), and then pass these documents as in input of Doc2Vec().
I have read the documentation about TaggedDocument here, but I don' t have understood what exactly are the parameters words…

Simone
- 4,800
- 12
- 30
- 46
2
votes
1 answer
Load a saved Doc2Vec model in Colab
I have trained and saved a model with doc2vec in colab as
model = gensim.models.Doc2Vec(vector_size=size_of_vector, window=10, min_count=5, workers=16,alpha=0.025, min_alpha=0.025, epochs=40)
model.build_vocab(allXs)
model.train(allXs,…

Valerio D. Ciotti
- 1,369
- 2
- 17
- 27
2
votes
0 answers
How to find relation between two columns of csv (containing labels and related data) file using doc2vec?
I am working on a problem related to doc2vec where i need to find labels that are related to a particular word. For ex (csv file):
Data Label / Tags
In a future world devastated by…

Sahil Singla
- 61
- 1
- 6
2
votes
1 answer
Export gensim doc2vec embeddings into separate file to use with keras Embedding layer later
I am a bit new to gensim and right now I am trying to solve the problem which involves using the doc2vec embeddings in keras. I wasn't able to find existing implementation of doc2vec in keras - as far as I see in all examples I found so far everyone…

Maksim Khaitovich
- 4,742
- 7
- 39
- 70
2
votes
1 answer
Set Batch Size *and* Number of Training Iterations for a neural network?
I am using the KNIME Doc2Vec Learner node to build a Word Embedding. I know how Doc2Vec works. In KNIME I have the option to set the parameters
Batch Size: The number of words to use for each batch.
Number of Epochs: The number of epochs to…

Make42
- 12,236
- 24
- 79
- 155