Doc2Vec is an unsupervised algorithm used to convert documents in vectors ("dense embeddings"). It is based on the "Paragraph Vector" paper and implemented in the Gensim Python library and elsewhere. The algorithm can work in either a "Distributed Bag Of Words" mode (PV-DBOW, which works somewhat analogously to skip-gram mode in Word2Vec) or a "Distributed Memory" mode (PV-DM, which is more analogous to CBOW mode in Word2Vec.)
Questions tagged [doc2vec]
556 questions
0
votes
1 answer
How to save gensim doc2vec model
after train the model, I use infer_vector() to get the vector successfully.
but after I save the model and load again, error appears as follows:
print "infer:", model.infer_vector(sents[0]).tolist()
File…

Zafedom
- 1
- 1
0
votes
1 answer
doc2vec: Pull documents from inferred document
i am new in word/paragraph embedding and trying to understand via doc2vec in GENSIM. I would like to seek advice on whether my understanding is incorrect. My understanding is that doc2vec is potentially able to return documents that may have…

Jax
- 33
- 1
- 7
0
votes
1 answer
How to count frequency in gensim.Doc2Vec?
I am training a model with gensim, my corpus is many short sentences, and each sentence has a frequency which indicates times it occurs in total corpus. I implement it as follow, as you can see, I just choose to do repeat freq times. Any way, if the…

roger
- 9,063
- 20
- 72
- 119
0
votes
1 answer
How to train doc2vec on AWS cluster using spark
I'm using python Gensim to train doc2vec. Is there any possibility to allow this code to be distributed on AWS (s3).
Thank you in advance

Regina
- 115
- 4
- 13
0
votes
1 answer
'Doc2Vec' object has no attribute 'wv'
When I load doc2vec model from pkl file, I get this error.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in…

Amnesiac
- 661
- 1
- 10
- 30
0
votes
1 answer
Why cosine_similarity of pretrained fasttex model is high between two sentents are not relative at all?
I am wondering to know why pre-trained 'fasttext model' with wiki(Korean) seems not to work well! :(
model = fasttext.load_model("./fasttext/wiki.ko.bin")
model.cosine_similarity("테스트 테스트 이건 테스트 문장", "지금 아무 관계 없는 글 정말로 정말로")
(in…

DSDS
- 57
- 7
0
votes
1 answer
Gensim: error while loading pretrained doc2vec model?
I'm loading pretrained Doc2Vec model using:
from gensim.models import Doc2Vec
model = Doc2Vec.load('/path/to/pretrained/model')
I'm getting the following error:
AttributeError: 'module' object has no attribute 'call_on_class_only'
Does anyone…

Regina
- 115
- 4
- 13
0
votes
1 answer
How build Doc2Vec model by useing an 'iterable' object
My code is running out of memory because of the question I asked in this page. Then, I wrote the second code to have an iterable alldocs, not an all-in-memory alldocs. I changed my code based on the explanation of this page. I am not familiar with…

user3092781
- 313
- 2
- 16
0
votes
0 answers
Classifier Accuracy - Too good to believe
Problem Statement - Classify a product review
classes - Travel,Hotel,Cars,Electronics,Food,Movies
I am approaching this problem with the famous Text Classification problem. Feature set is prepared by using Doc2Vec default model from gensim and for…

Rashmi Singh
- 519
- 1
- 8
- 20
0
votes
1 answer
Gensim Doc2Vec Exception AttributeError: 'str' object has no attribute 'words'
I am learning Doc2Vec model from gensim library and using it as follows:
class MyTaggedDocument(object):
def __init__(self, dirname):
self.dirname = dirname
def __iter__(self):
for fname in os.listdir(self.dirname):
…

Rashmi Singh
- 519
- 1
- 8
- 20
0
votes
1 answer
word vector and paragraph vector query
I am trying to understand relation between word2vec and doc2vec vectors in Gensim's implementation. In my application, I am tagging multiple documents with same label (topic), I am training a doc2vec model on my corpus using dbow_words=1 in order to…

user7127620
- 3
- 2
0
votes
1 answer
Embedding lookup from multiple embeddings in tensorflow
Building a doc2Vec algorithm, there is a need for having multiple embeddings around. There are embeddings for the word vectors, while at the same time there are embeddings for the documents themselves. The way the algorithm works is similar to that…

TheM00s3
- 3,677
- 4
- 31
- 65
0
votes
1 answer
Readlines function for an xlsx file works inproper
The goal is sentiment classification. The steps are to open 3 xlsx files, read them, process with gensim.doc2vec methods and classify with SGDClassificator. Just try to repeat this code on doc2vec. Python 2.7
with open('C:/doc2v/trainpos.xlsx','r')…

Talka
- 15
- 10
0
votes
1 answer
updates of the document vectors in doc2vec (PV-DM) in gensim
I'm trying to understand the PV-DM implementation with averaging in gensim.
In the function train_document_dm in doc2vec.py the return value ("errors") of train_cbow_pair is in the case of averaging (cbow_mean=1) not divided by the number of input…

саша
- 521
- 5
- 20
0
votes
1 answer
How to use doc2vec with phrases?
i want to have phrases in doc2vec and i use gensim.phrases. in doc2vec we need tagged document to train the model and i cannot tag the phrases. how i can do this?
here is my code
text = phrases.Phrases(text)
for i in range(len(text)):
string1 =…

Majid
- 23
- 5