Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions
2
votes
2 answers

Inaccurate similarities results by doc2vec using gensim library

I am working with Gensim library to train some data files using doc2vec, while trying to test the similarity of one of the files using the method model.docvecs.most_similar("file") , I always get all the results above 91% with almost no difference…
Abdessamad139
  • 325
  • 4
  • 16
2
votes
0 answers

Error while uisng doc2vec

I am trying to generate vectors from a list of sentences. x1 = 'Today I’d like to start a series of some posts concerning extreme value analysis using R.' x2 = 'Basically, there are several very useful packages in R which provide methods and…
mommomonthewind
  • 4,390
  • 11
  • 46
  • 74
2
votes
1 answer

Gensim doc2vec most_similar equivalent to get full documents

In Gensim's doc2vec implementation, gensim.models.keyedvectors.Doc2VecKeyedVectors.most_similar returns the tags and cosine similarity of the documents most similar to the query document. What if I want the actual documents themselves and not the…
Syncrossus
  • 570
  • 3
  • 17
2
votes
1 answer

doc2vec get most similar document

I am struggling to understand the usage of doc2vec. I trained a toy model on a set of documents using some sample code I saw on googling. Next I want to find the document that the model considers to be the closest match to documents in my training…
Lin Endian
  • 93
  • 3
  • 9
2
votes
1 answer

gensim: 'Doc2Vec' object has no attribute 'intersect_word2vec_format' when I load the Google pre-trained word2vec model

I get this error when I load the google pre-trained word2vec to train doc2vec model with my own data. Here is part of my…
Xizi Wei
  • 23
  • 1
  • 3
2
votes
1 answer

Document classification using word vectors

While I was classifying and clustering the documents written in natural language, I came up with a question ... As word2vec and glove, and or etc, vectorize the word in distributed spaces, I wonder if there are any method recommended or commonly…
Isaac Sim
  • 539
  • 1
  • 7
  • 23
2
votes
0 answers

Gensim worker thread stuck

I am training document embeddings on a ~20 million sentences and using parallel processing in gensim. I'm creating my model and training with the following code class read_corpus(object): def __init__(self, fname, n): self.fname =…
bbrodrigues
  • 115
  • 8
2
votes
1 answer

How to use vectors from Doc2Vec in Tensorflow

I am trying to use Doc2Vec to convert sentences to vectors, then use those vectors to train a tensorflow classifier. I am a little confused at what tags are used for, and how to extract all of the document vectors from Doc2Vec after it has finished…
2
votes
1 answer

gensim model return ids not related with input doc2vec

I created a model from mongodb db news and I tagged the documents by mongo collection id from gensim.models.doc2vec import TaggedDocument i=0 docs=[] for artical in lstcontent: doct = TaggedDocument(clean_str(artical), [lstids[i]]) …
2
votes
1 answer

human-interpretable, meaningful clusters using doc2vec

I am clustering a set of education documents using doc2vec. As a human, I think of these as in categories such as: computer-related language related collaboration arts etc. I wonder if there is a way to 'guide' the doc2vec clustering into a set…
2
votes
1 answer

How to properly tag a list of documenta by Gensim TaggedDocument()

I would like to tag a list of documents by Gensim TaggedDocument(), and then pass these documents as in input of Doc2Vec(). I have read the documentation about TaggedDocument here, but I don' t have understood what exactly are the parameters words…
Simone
  • 4,800
  • 12
  • 30
  • 46
2
votes
1 answer

Load a saved Doc2Vec model in Colab

I have trained and saved a model with doc2vec in colab as model = gensim.models.Doc2Vec(vector_size=size_of_vector, window=10, min_count=5, workers=16,alpha=0.025, min_alpha=0.025, epochs=40) model.build_vocab(allXs) model.train(allXs,…
Valerio D. Ciotti
  • 1,369
  • 2
  • 17
  • 27
2
votes
0 answers

How to find relation between two columns of csv (containing labels and related data) file using doc2vec?

I am working on a problem related to doc2vec where i need to find labels that are related to a particular word. For ex (csv file): Data Label / Tags In a future world devastated by…
Sahil Singla
  • 61
  • 1
  • 6
2
votes
1 answer

Export gensim doc2vec embeddings into separate file to use with keras Embedding layer later

I am a bit new to gensim and right now I am trying to solve the problem which involves using the doc2vec embeddings in keras. I wasn't able to find existing implementation of doc2vec in keras - as far as I see in all examples I found so far everyone…
Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70
2
votes
1 answer

Set Batch Size *and* Number of Training Iterations for a neural network?

I am using the KNIME Doc2Vec Learner node to build a Word Embedding. I know how Doc2Vec works. In KNIME I have the option to set the parameters Batch Size: The number of words to use for each batch. Number of Epochs: The number of epochs to…
Make42
  • 12,236
  • 24
  • 79
  • 155