Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

Multiple tags for single document in doc2vec. TaggedDocument

Is it possible to to train a doc2vec model where a single document has multiple tags? For example, in movie reviews, doc0 = doc2vec.TaggedDocument(words=review0,tags=['UID_0','horror','action']) doc1 =…

asked Sep 06 '17 at 19:52

unknown_jy

votes

1 answer

Difference between TaggedDocument and TaggedLineDocument in gensim? and How to work with files in a directory?

I am new to doc2vec and I wish to classify set of texts using it. I am confused about TaggedDocument and TaggedLineDocument. 1) What is the difference between two? Is it that TaggedLineDocument is collection of TaggedDocuments? 2) If I have a…

nlp gensim word2vec text-classification doc2vec

asked Jul 11 '17 at 23:34

dfault

votes

1 answer

scikit-learn classification using doc2vec representation

I want to classify text documents using doc2vec representation and scikit-learn models. My problem is that I'm lost on how to get started. can someone explain the general steps usually taken to use doc2vec with scikit-learn?

machine-learning scikit-learn text-classification doc2vec

asked Nov 27 '16 at 20:19

MikeAlbert

votes

2 answers

How to get word vectors from a gensim Doc2Vec?

I trained a gensim.models.doc2vec.Doc2Vec model d2v_model = Doc2Vec(sentences, size=100, window=8, min_count=5, workers=4) and I can get document vectors by docvec = d2v_model.docvecs[0] How can I get word vectors from trained model ?

gensim word2vec doc2vec

asked May 19 '16 at 23:49

V Y

votes

1 answer

My Doc2Vec code, after many loops/epochs of training, isn't giving good results. What might be wrong?

I'm training a Doc2Vec model using the below code, where tagged_data is a list of TaggedDocument instances I set up before: max_epochs = 40 model = Doc2Vec(alpha=0.025, min_alpha=0.001) model.build_vocab(tagged_data) for epoch in…

gensim word2vec doc2vec

asked Jul 08 '20 at 18:10

gojomo

52,260
14
86
115

votes

1 answer

How to perform efficient queries with Gensim doc2vec?

I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gensim v.3.7.1, and I have trained both word2vec and doc2vec models. The…

python gensim similarity doc2vec sentence-similarity

asked May 14 '19 at 12:06

María Benavente

votes

1 answer

Doc2vec beyond beginner guidance

I've been using doc2vec in the most basic way so far with limited success. I'm able to find similar documents however often I get a lot of false positives. My primary goal is to build a classification algorithm for user requirements. This is to…

python dataframe gensim doc2vec

asked Mar 25 '19 at 10:07

Philip Wilson

votes

3 answers

Doc2Vec & classification - very poor results

I have a dataset of 6000 observations; a sample of it is the following: job_id job_title job_sector 30018141 Secondary Teaching Assistant Education 30006499 Legal Sales…

python classification gensim text-classification doc2vec

asked Mar 22 '19 at 23:51

Outcast

4,967
5
44
99

votes

2 answers

Cosine Similarity between Lists of Sentences using Doc2Vec

I'm new to NLP but I'm trying to match a list of sentences to another list of sentences in Python based on their semantic similarity. For example, list1 = ['what they ate for lunch', 'height in inches', 'subjectid'] list2 = ['food eaten two days…

python-3.x nlp data-science cosine-similarity doc2vec

asked Mar 08 '19 at 16:40

m13op22

2,168
2
16
35

votes

1 answer

Doc2Vec Clustering with kmeans for a new document

I have a corpus trained with Doc2Vec as follows: d2vmodel = Doc2Vec(vector_size=100, min_count=5, epochs=10) d2vmodel.build_vocab(train_corpus) d2vmodel.train(train_corpus, total_examples=d2vmodel.corpus_count, epochs=d2vmodel.epochs) Using the…

cluster-analysis k-means doc2vec

asked Dec 07 '18 at 05:19

kami

votes

1 answer

doc2vec: measurement of performance and 'workers' parameter

I have an awfully large corpora as input to my doc2vec training, around 23mil documents streamed using an iterable function. I was wondering if it were at all possible to see the development of my training progress, possibly through finding out…

python nlp multiprocessing word2vec doc2vec

asked Dec 05 '18 at 19:13

apgsov

votes

2 answers

Doc2vec predictions - do we average the words or what is the paragraph ID for a new paragraph?

I understand that you treat the paragraph ID as a new word in doc2vec (DM approach, left on the figure) during training. The training output is the context word. After a model is trained, suppose I want to get 1 embedding given a new document. Do I…

nlp word2vec word-embedding doc2vec

asked Oct 26 '18 at 08:42

dorien

5,265
10
57
116

votes

1 answer

How find the most decisive sentences or words in a document via Doc2Vec?

I've trained a Doc2Vec model in order to do a simple binary classification task, but I would also love to see which words or sentences weigh more in terms of contributing to the meaning of a given text. So far I had no luck finding anything relevant…

python nlp gensim word2vec doc2vec

asked Aug 11 '18 at 09:18

Farhood ET

1,432
15
32

votes

1 answer

Paragraph Vector or Doc2vec model size

I am using deeplearning4j java library to build paragraph vector model (doc2vec) of dimension 100. I am using a text file. It has around 17 million lines, and size of the file is 330 MB. I can train the model and calculate paragraph vector which…

nlp gensim word-embedding doc2vec deeplearning4j

asked Jun 20 '18 at 10:17

tired and bored dev

votes

1 answer

Gensim Doc2Vec getting the doc tags from the Concatenated model

I'm trying to replicate Mikolov's work of PV-DM + PV-DBOW. He says that both algorithms should be used in order to get better results. For this reason I'm trying to train the model and then give the document tags to t-SNE. Using Gensim's Doc2Vec I…

python model gensim doc2vec

asked May 25 '18 at 18:51

Carlos Martin del Campo

Prev 1 2 3

…

37 38 Next