Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

How to put maximum vocabulary frequency in doc2vec

Doc2vec while creating the vocabulary has possibility to put minimum occurence of the word in documents to be included in vocabulary as parameter min_count. model = gensim.models.doc2vec.Doc2Vec(vector_size=200, min_count=3,…

asked Jun 06 '19 at 13:14

Igor sharm

votes

1 answer

gensim Doc2Vec word not in vocabulary

I am training a doc2vec gensim model with txt file 'full_texts.txt' that contains ~1600 documents. Once I have trained the model, I wish to use similarity methods over words and sentences. However, since this is my first time using gensim , I am…

python nlp gensim word2vec doc2vec

asked Apr 27 '19 at 20:18

Shoaibkhanz

1,942
3
24
41

votes

1 answer

What text processing does WikiCorpus perform in gensim?

I have trained a doc2vec model on the Wikipedia corpus using gensim and I wish to retrieve vectors from different documents. I was wondering what text processing the WikiCorpus function did when I used it to train my model e.g. removed punctuation,…

python gensim doc2vec

asked Apr 12 '19 at 23:25

OultimoCoder

votes

0 answers

Doc2Vec error: need at least one array to concatenate

I am running into an error trying to apply a doc2vec model to some text. The tutorial I am following is here. However I cannot seem to "replicate" the results on some new text information. I have read other SO posts about this issue and its because…

python doc2vec

asked Apr 10 '19 at 14:13

user113156

6,761
5
35
81

votes

0 answers

Doc2vec on a corpus of novels: how do I assign to each sentence of a novel one tag for the ID of the sentence and one tag for the ID of the book?

I am trying to train a doc2vec model on a corpus of six novels and I need to build the corpus of Tagged Documents. Each novel is a txt file, already preprocessed and read into python using the read() method, so that it appears as a "long string".…

python gensim doc2vec

asked Mar 27 '19 at 11:56

Federica Martinelli

votes

1 answer

Doc2Vec: get text of the label

I've trained Doc2Vec model I'm trying to get predictions. I use test_data = word_tokenize("Филип Моррис Продактс С.А.".lower()) model = Doc2Vec.load(model_path) v1 = model.infer_vector(test_data) sims =…

python gensim doc2vec

asked Feb 17 '19 at 19:29

Petr Petrov

4,090
10
31
68

votes

1 answer

I get more vectors than my documents size - gensim doc2vec

I have protein sequences and want to do doc2vec. My goal is to have one vector for each sentence/sequence. I have 1612 sentences/sequences and 30 classes so the label is not unique and many documents share the same labels. So when I first tried…

python tags gensim doc2vec

asked Feb 04 '19 at 17:28

user10950908

votes

1 answer

Gensim Doc2vec – KeyError: "tag not seen in training corpus/invalid"

I am using gensim's Doc2vec to learn features from news articles. I can successfully train my documents. However, I struggle to retrieve the document vectors from the model for further processing. Example code (directly taken from gensim's…

python gensim doc2vec

asked Dec 15 '18 at 20:04

petezurich

9,280
9
43
57

votes

1 answer

Python Calculating similarity between two documents using word2vec, doc2vec

I am trying to calculate similarity between two documents which are comprised of more than thousands sentences. Baseline would be calculating cosine similarity using BOW. However, I want to capture more of semantic difference between…

python similarity gensim word2vec doc2vec

asked Nov 25 '18 at 12:25

ChanKim

votes

1 answer

Gensim Doc2vec model: how to compute similarity on a corpus obtained using a pre-trained doc2vec model?

I have a model based on doc2vec trained on multiple documents. I would like to use that model to infer the vectors of another document, which I want to use as the corpus for comparison. So, when I look for the most similar sentence to one I…

python nlp gensim doc2vec

asked Nov 19 '18 at 14:11

José Santos

votes

1 answer

Unsupervised sentiment Analysis using doc2vec

Folks, I have searched Google for different type of papers/blogs/tutorials etc but haven't found anything helpful. I would appreciate if anyone can help me. Please note that I am not asking for code step-by-step but rather an idea/blog/paper or some…

nlp gensim word2vec sentiment-analysis doc2vec

asked Nov 09 '18 at 20:32

Saurabh Gokhale

53,625
36
139
164

votes

1 answer

GridSearch for doc2vec model built using gensim

I am trying to find best hyperparameters for my trained doc2vec gensim model which takes a document as an input and create its document embeddings. My train data consists of text documents but it doesn't have any labels. i.e. I just have 'X' but not…

machine-learning gensim grid-search doc2vec hyperparameters

asked Oct 18 '18 at 14:12

Rajat

votes

1 answer

How to classify text documents in legal domain

I've been working on a project which is about classifying text documents in the legal domain (Legal Judgment Prediction class of problems). The given data set consists of 700 legal documents (well balanced in two classes). After the preprocessing,…

python svm text-classification word-embedding doc2vec

asked Oct 01 '18 at 12:49

hey_rey

votes

0 answers

Doc2Vec with Keras

According to Micholov paper I want to compute Doc2Vec using Keras. I'm new on Keras so I need your help. There is a corpus of documents with an Id and I want to get two embeddings matrices : one for words and one for paragraphs, isn't it ? Is it…

machine-learning keras doc2vec

asked Sep 26 '18 at 12:48

user1789654813

votes

1 answer

Doc2vec output data for only a single document and not two documents vectors

I try to build a simple program to test on my understanding about Doc2Vec and it seems like I still have a long way to go before knowing it. I understand that each sentence in the document is first being labeled with its own label and for doc2vec…

python doc2vec

asked Sep 21 '18 at 04:47

JJson

Prev 1 2 3

…

37 38 Next