10

I am trying to experiment gensim doc2vec, by using following code. As far as I understand from tutorials, it should work. However it gives AttributeError: 'list' object has no attribute 'words'.

from gensim.models.doc2vec import LabeledSentence, Doc2Vec
document = LabeledSentence(words=['some', 'words', 'here'], tags=['SENT_1']) 
model = Doc2Vec(document, size = 100, window = 300, min_count = 10, workers=4)

So what did I do wrong? Any help please. Thank you. I am using python 3.5 and gensim 0.12.4

W.S.
  • 647
  • 1
  • 6
  • 19

1 Answers1

4

Input to gensim.models.doc2vec should be an iterator over the LabeledSentence (say a list object). Try:

model = Doc2Vec([document], size = 100, window = 1, min_count = 1, workers=1)

I have reduced the window size, and min_count so that they make sense for the given input. Also go through this nice tutorial on Doc2Vec, if you haven't already.

kampta
  • 4,748
  • 5
  • 31
  • 51
  • Thanks for helping. but I got this error. OverflowError: Python int too large to convert to C long. do you know why? Thanks. – W.S. Apr 15 '16 at 09:28
  • At which step are you getting this error? Can you post your error trace? – kampta Apr 15 '16 at 09:40
  • I think it was below: File "C:\Anaconda3\envs\sandbox\lib\site-packages\gensim\models\word2vec.py", line 944, in seeded_vector once = random.RandomState(uint32(self.hashfxn(seed_string))) OverflowError: Python int too large to convert to C long – W.S. Apr 15 '16 at 09:44
  • Is your input same as the one in the question? Please provide a MCVE (http://stackoverflow.com/help/mcve) – kampta Apr 15 '16 at 12:34
  • Yes. from gensim.models.doc2vec import LabeledSentence, Doc2Vec document = LabeledSentence(words=['some', 'words', 'here'], tags=['SENT_1']) model = Doc2Vec([document], size = 100, window = 1, min_count = 1, workers=1) – W.S. Apr 15 '16 at 20:30
  • Alright, I wasn't able to reproduce the error, however, https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/11197/gensim-word2vec-cython-on-windows/68017#post68017 - seem to have a solution. Define your own hash function pass that for training; `def hash32(value): return hash(value) & 0xffffffff; model = Doc2Vec([document], size = 100, window = 1, min_count = 1, workers=1, hashfxn=hash32)` – kampta Apr 16 '16 at 02:56