Questions tagged [doc2vec]

Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)

556 questions

votes

1 answer

How to save gensim doc2vec model

after train the model, I use infer_vector() to get the vector successfully. but after I save the model and load again, error appears as follows: print "infer:", model.infer_vector(sents[0]).tolist() File…

python gensim doc2vec

asked Jun 18 '17 at 09:32

Zafedom

votes

1 answer

doc2vec: Pull documents from inferred document

i am new in word/paragraph embedding and trying to understand via doc2vec in GENSIM. I would like to seek advice on whether my understanding is incorrect. My understanding is that doc2vec is potentially able to return documents that may have…

doc2vec

asked Jun 16 '17 at 12:59

Jax

votes

1 answer

How to count frequency in gensim.Doc2Vec?

I am training a model with gensim, my corpus is many short sentences, and each sentence has a frequency which indicates times it occurs in total corpus. I implement it as follow, as you can see, I just choose to do repeat freq times. Any way, if the…

python gensim word2vec doc2vec

asked Jun 12 '17 at 10:29

roger

9,063
20
72
119

votes

1 answer

How to train doc2vec on AWS cluster using spark

I'm using python Gensim to train doc2vec. Is there any possibility to allow this code to be distributed on AWS (s3). Thank you in advance

python-2.7 amazon-s3 aws-lambda doc2vec

asked May 30 '17 at 05:06

Regina

votes

1 answer

'Doc2Vec' object has no attribute 'wv'

When I load doc2vec model from pkl file, I get this error. --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in…

python nlp gensim word2vec doc2vec

asked Apr 22 '17 at 00:55

Amnesiac

votes

1 answer

Why cosine_similarity of pretrained fasttex model is high between two sentents are not relative at all?

I am wondering to know why pre-trained 'fasttext model' with wiki(Korean) seems not to work well! :( model = fasttext.load_model("./fasttext/wiki.ko.bin") model.cosine_similarity("테스트 테스트 이건 테스트 문장", "지금 아무 관계 없는 글 정말로 정말로") (in…

word2vec cosine-similarity doc2vec fasttext

asked Apr 18 '17 at 15:41

DSDS

votes

1 answer

Gensim: error while loading pretrained doc2vec model?

I'm loading pretrained Doc2Vec model using: from gensim.models import Doc2Vec model = Doc2Vec.load('/path/to/pretrained/model') I'm getting the following error: AttributeError: 'module' object has no attribute 'call_on_class_only' Does anyone…

python gensim doc2vec

asked Mar 31 '17 at 17:24

Regina

votes

1 answer

How build Doc2Vec model by useing an 'iterable' object

My code is running out of memory because of the question I asked in this page. Then, I wrote the second code to have an iterable alldocs, not an all-in-memory alldocs. I changed my code based on the explanation of this page. I am not familiar with…

python iterator gensim doc2vec

asked Feb 21 '17 at 16:07

user3092781

votes

0 answers

Classifier Accuracy - Too good to believe

Problem Statement - Classify a product review classes - Travel,Hotel,Cars,Electronics,Food,Movies I am approaching this problem with the famous Text Classification problem. Feature set is prepared by using Doc2Vec default model from gensim and for…

python pca gensim text-classification doc2vec

asked Jan 11 '17 at 15:10

Rashmi Singh

votes

1 answer

Gensim Doc2Vec Exception AttributeError: 'str' object has no attribute 'words'

I am learning Doc2Vec model from gensim library and using it as follows: class MyTaggedDocument(object): def __init__(self, dirname): self.dirname = dirname def __iter__(self): for fname in os.listdir(self.dirname): …

python neural-network gensim word2vec doc2vec

asked Dec 19 '16 at 13:07

Rashmi Singh

votes

1 answer

word vector and paragraph vector query

I am trying to understand relation between word2vec and doc2vec vectors in Gensim's implementation. In my application, I am tagging multiple documents with same label (topic), I am training a doc2vec model on my corpus using dbow_words=1 in order to…

similarity gensim word2vec temporal doc2vec

asked Nov 07 '16 at 18:30

user7127620

votes

1 answer

Embedding lookup from multiple embeddings in tensorflow

Building a doc2Vec algorithm, there is a need for having multiple embeddings around. There are embeddings for the word vectors, while at the same time there are embeddings for the documents themselves. The way the algorithm works is similar to that…

nlp tensorflow word2vec doc2vec

asked Sep 13 '16 at 23:35

TheM00s3

3,677
4
31
65

votes

1 answer

Readlines function for an xlsx file works inproper

The goal is sentiment classification. The steps are to open 3 xlsx files, read them, process with gensim.doc2vec methods and classify with SGDClassificator. Just try to repeat this code on doc2vec. Python 2.7 with open('C:/doc2v/trainpos.xlsx','r')…

python xlsx readlines doc2vec

asked Sep 01 '16 at 13:24

Talka

votes

1 answer

updates of the document vectors in doc2vec (PV-DM) in gensim

I'm trying to understand the PV-DM implementation with averaging in gensim. In the function train_document_dm in doc2vec.py the return value ("errors") of train_cbow_pair is in the case of averaging (cbow_mean=1) not divided by the number of input…

python numpy gensim word2vec doc2vec

asked Aug 31 '16 at 14:51

саша

votes

1 answer

How to use doc2vec with phrases?

i want to have phrases in doc2vec and i use gensim.phrases. in doc2vec we need tagged document to train the model and i cannot tag the phrases. how i can do this? here is my code text = phrases.Phrases(text) for i in range(len(text)): string1 =…

python nlp gensim phrases doc2vec

asked Aug 16 '16 at 06:53

Majid

Prev 1 2 3

…

37 38 Next