Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

Hierarchical training for doc2vec: how would assigning same labels to sentences of the same document work?

What is the effect of assigning the same label to a bunch of sentences in doc2vec? I have a collection of documents that I want to learn vectors using gensim for a "file" classification task where file refers to a collection of documents for a given…

asked Jun 24 '18 at 22:25

HMK

votes

1 answer

Doc2Vec input format

running gensim Doc2Vec over ubuntu Doc2Vec rejects my input with the error AttributeError: 'list' object has no attribute 'words' import gensim from gensim.models import doc2vec as dtv from nltk.corpus import brown documents =…

gensim doc2vec

asked Jun 22 '18 at 16:29

Lcat

votes

1 answer

Can doc2vec be useful if training on Documents and inferring on sentences only

I am training with some documents with gensim's Doc2vec. I have two types of inputs: Whole English Wikipedia: Each article of Wikipedia text is considered as one document for doc2vec training. (Total around 5.5 million articles or…

python gensim training-data doc2vec

asked Jun 05 '18 at 05:38

DK818

votes

0 answers

Doc2vec most_similar method returns similarity score higher than 1

I have trained doc2vec model by following this tutorial for 500.000 documents. https://github.com/abtpst/Doc2Vec/blob/master/trainDoc2Vec.py However, when I try to find most_similar documents for a given document, the results have similarity higher…

python python-3.x gensim doc2vec

asked Jun 04 '18 at 16:21

akoksal

votes

1 answer

Gensim DOC2VEC trims and delete the vocabulary

I tried creating a simple Doc2Vec model: sentences = [] sentences.append(doc2vec.TaggedDocument(words=[u'scarpe', u'rosse', u'con', u'tacco'], tags=[1])) sentences.append(doc2vec.TaggedDocument(words=[u'scarpe', u'blu'], tags=[2])) …

python gensim doc2vec vocabulary

asked May 28 '18 at 14:59

Nicolò Gasparini

2,228
2
24
53

votes

0 answers

Vector representation for token and compound word

I have a corpus of sentences. Each of them may contain marked compound words. For example: This is an example_sentence followed by another awesome_paragraph . I want to get embedding vector for all tokens and compound words (this, is, an,…

python machine-learning word2vec gensim doc2vec

asked May 16 '18 at 02:49

Brody

votes

1 answer

Shape ValueError in LSTM network using Tensorflow

I want to train a LSTM model with Tensorflow. I have a text data as input and I get doc2vec of each paragraph of the text and pass it to the lstm layers but I get ValueError because of inconsistency of shape rank. I've searched through Stackoverflow…

python tensorflow nlp lstm doc2vec

asked May 15 '18 at 14:27

Mina smz

votes

1 answer

Normalize the similarity between word vectors and document vectors?

Cosine similarity is broadly used for measuring the similarity between two vectors, where two could be word vectors or document vectors. Others, like manhattan, euclidean, minkowski, etc, are also popular. Cosine similarity gives the number between…

vector compare similarity word2vec doc2vec

asked May 15 '18 at 02:18

Isaac Sim

votes

1 answer

semantic and syntactic performance of Doc2vec model

I am trying to check the semantic and syntactic performance of a doc2vec model- doc2vec_model.accuracy(questions-words), but it doesnt seem to function since models.deprecated.doc2vec – Deep learning with paragraph2vec, says it has been deprecated…

python-3.x word-embedding doc2vec

asked Apr 28 '18 at 11:39

Dela

votes

1 answer

How are vectors calculated in doc2vec and what does the size parameter depict?

If I pass a Sentence containing 5 words to the Doc2Vec model and if the size is 100, there are 100 vectors. I'm not getting what are those vectors. If I increase the size to 200, there are 200 vectors for just a simple sentence. Please tell me how…

python-3.x nlp doc2vec

asked Apr 18 '18 at 11:13

Yash Ghorpade

votes

0 answers

matching between two separate documents using gensim doc2vec

i have two separate data sets, one is resumes and the other is demands, using gensim doc2vec, i created models for each and i am able to query similar words in each data sets, but now, i need to merge these two models into one and query for resumes…

gensim doc2vec

asked Apr 13 '18 at 06:56

krits

votes

2 answers

How to measure the word weight using doc2vec vector

I'm using the word2vec algorithm to detect the most important words in a document, my question is about how to compute the weight of an important word using the vector obtained from doc2vec, my code is like that: model =…

python algorithm word-embedding doc2vec

asked Apr 08 '18 at 11:41

ucmou

votes

0 answers

Document tags in vectorization models

I am a little new to python and the unsupervised learning methods, but I have a quick question. where as doc2vec model has docvecs property holding all trained vectors for the 'document tags' seen during training; Are there similar properties that…

python vectorization word2vec doc2vec fasttext

asked Apr 05 '18 at 11:09

Dela

votes

1 answer

cosine similarity is 0.7 for exactly same sentences

Cosine similarity for exactly two same sentences is 0.7. Is my doc2vec model correct? I am using quora question pairs dataset available in kaggle. In the code below, train1 is the list of first questions and train2 is the list of second…

python-3.5 doc2vec

asked Mar 31 '18 at 09:24

Gautam Kumar

votes

1 answer

How to get most similar words to a document in gensim doc2vec?

I have built a gensim Doc2vec model. Let's call it doc2vec. Now I want to find the most relevant words to a given document according to my doc2vec model. For example, I have a document about "java" with the tag "doc_about_java". When I ask for…

word2vec gensim doc2vec

asked Mar 07 '18 at 00:46

aburkov

Prev 1 2 3

…

37 38 Next