Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

2 answers

Inaccurate similarities results by doc2vec using gensim library

I am working with Gensim library to train some data files using doc2vec, while trying to test the similarity of one of the files using the method model.docvecs.most_similar("file") , I always get all the results above 91% with almost no difference…

asked Sep 12 '18 at 01:44

Abdessamad139

votes

0 answers

Error while uisng doc2vec

I am trying to generate vectors from a list of sentences. x1 = 'Today I’d like to start a series of some posts concerning extreme value analysis using R.' x2 = 'Basically, there are several very useful packages in R which provide methods and…

python-3.x doc2vec

asked Jun 26 '18 at 10:10

mommomonthewind

4,390
11
46
74

votes

1 answer

Gensim doc2vec most_similar equivalent to get full documents

In Gensim's doc2vec implementation, gensim.models.keyedvectors.Doc2VecKeyedVectors.most_similar returns the tags and cosine similarity of the documents most similar to the query document. What if I want the actual documents themselves and not the…

python-3.x nlp text-mining gensim doc2vec

asked May 25 '18 at 13:50

Syncrossus

votes

1 answer

doc2vec get most similar document

I am struggling to understand the usage of doc2vec. I trained a toy model on a set of documents using some sample code I saw on googling. Next I want to find the document that the model considers to be the closest match to documents in my training…

python machine-learning gensim doc2vec

asked May 20 '18 at 19:03

Lin Endian

votes

1 answer

gensim: 'Doc2Vec' object has no attribute 'intersect_word2vec_format' when I load the Google pre-trained word2vec model

I get this error when I load the google pre-trained word2vec to train doc2vec model with my own data. Here is part of my…

word2vec gensim doc2vec

asked May 08 '18 at 15:28

Xizi Wei

votes

1 answer

Document classification using word vectors

While I was classifying and clustering the documents written in natural language, I came up with a question ... As word2vec and glove, and or etc, vectorize the word in distributed spaces, I wonder if there are any method recommended or commonly…

machine-learning nlp vectorization word2vec doc2vec

asked May 08 '18 at 03:21

Isaac Sim

votes

0 answers

Gensim worker thread stuck

I am training document embeddings on a ~20 million sentences and using parallel processing in gensim. I'm creating my model and training with the following code class read_corpus(object): def __init__(self, fname, n): self.fname =…

python nlp word2vec gensim doc2vec

asked Apr 29 '18 at 17:42

bbrodrigues

votes

1 answer

How to use vectors from Doc2Vec in Tensorflow

I am trying to use Doc2Vec to convert sentences to vectors, then use those vectors to train a tensorflow classifier. I am a little confused at what tags are used for, and how to extract all of the document vectors from Doc2Vec after it has finished…

python tensorflow nlp word2vec doc2vec

asked Apr 22 '18 at 16:00

Damian Reiter

votes

1 answer

gensim model return ids not related with input doc2vec

I created a model from mongodb db news and I tagged the documents by mongo collection id from gensim.models.doc2vec import TaggedDocument i=0 docs=[] for artical in lstcontent: doct = TaggedDocument(clean_str(artical), [lstids[i]]) …

word2vec gensim cosine-similarity doc2vec

asked Apr 14 '18 at 09:10

abdalmohaymen aliesmaeel

votes

1 answer

human-interpretable, meaningful clusters using doc2vec

I am clustering a set of education documents using doc2vec. As a human, I think of these as in categories such as: computer-related language related collaboration arts etc. I wonder if there is a way to 'guide' the doc2vec clustering into a set…

text cluster-analysis word2vec text-classification doc2vec

asked Apr 13 '18 at 07:58

user7400474

votes

1 answer

How to properly tag a list of documenta by Gensim TaggedDocument()

I would like to tag a list of documents by Gensim TaggedDocument(), and then pass these documents as in input of Doc2Vec(). I have read the documentation about TaggedDocument here, but I don' t have understood what exactly are the parameters words…

nlp gensim doc2vec

asked Apr 03 '18 at 16:50

Simone

4,800
12
30
46

votes

1 answer

Load a saved Doc2Vec model in Colab

I have trained and saved a model with doc2vec in colab as model = gensim.models.Doc2Vec(vector_size=size_of_vector, window=10, min_count=5, workers=16,alpha=0.025, min_alpha=0.025, epochs=40) model.build_vocab(allXs) model.train(allXs,…

python gensim google-colaboratory doc2vec

asked Mar 25 '18 at 16:30

Valerio D. Ciotti

1,369
2
17
27

votes

0 answers

How to find relation between two columns of csv (containing labels and related data) file using doc2vec?

I am working on a problem related to doc2vec where i need to find labels that are related to a particular word. For ex (csv file): Data Label / Tags In a future world devastated by…

tags keyword label word2vec doc2vec

asked Mar 05 '18 at 09:53

Sahil Singla

votes

1 answer

Export gensim doc2vec embeddings into separate file to use with keras Embedding layer later

I am a bit new to gensim and right now I am trying to solve the problem which involves using the doc2vec embeddings in keras. I wasn't able to find existing implementation of doc2vec in keras - as far as I see in all examples I found so far everyone…

keras gensim word-embedding doc2vec

asked Feb 27 '18 at 00:19

Maksim Khaitovich

4,742
7
39
70

votes

1 answer

Set Batch Size and Number of Training Iterations for a neural network?

I am using the KNIME Doc2Vec Learner node to build a Word Embedding. I know how Doc2Vec works. In KNIME I have the option to set the parameters Batch Size: The number of words to use for each batch. Number of Epochs: The number of epochs to…

neural-network doc2vec knime

asked Feb 22 '18 at 10:26

Make42

12,236
24
79
155

Prev 1 2 3

…

37 38 Next