0

I am trying to save gensim Doc2vec model. The model is trained on 9M document vectors and vocabulary of around 1M words. But I am getting pickel error. "top" shows that the program uses around 13GB of RAM. Also I think since I need to re-train the model for new documents as and when required, saving all parameters is necessary.

Traceback (most recent call last):
 File "doc_2_vec.py", line 61, in <module>

model.save("/data/model_wl_videos/model",pickle_protocol=2)
 File "/home/meghana/.local/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1406, in save
super(Word2Vec, self).save(*args, **kwargs)
 File "/home/meghana.negi/.local/lib/python2.7/site-packages/gensim/utils.py", line 504, in save
pickle_protocol=pickle_protocol)
 File "/home/meghana/.local/lib/python2.7/site-packages/gensim/utils.py", line 376, in _smart_save
pickle(self, fname, protocol=pickle_protocol)
 File "/home/meghana/.local/lib/python2.7/site-packages/gensim/utils.py", line 930, in pickle
_pickle.dump(obj, fout, protocol=protocol)

MemoryError

maggs
  • 763
  • 2
  • 9
  • 15
  • 1
    `pickle` constructs the dump in memory before writing it to disk. This is done in order to get repeated object references right, IIRC. So you can expect an increase in memory consumption that is a considerable fraction of what the targeted structure uses. I had a case of a large dictionary that I wanted to pickle. Its TSV source was 2.5G, in memory it used 12G, and the `pickle` call temporarily increased the footprint to 20G. – lenz Aug 20 '17 at 20:49
  • I think I can guess what is causing your problem, but I don't know what you expect as an answer, since there's no question in your post. Probably you just need to run this code on a machine with more RAM (you have 16G now, right?). – lenz Aug 20 '17 at 21:00
  • @lenz Prefect. Thanks. Increasing memory worked. Model saved is 2G, it was using 13G RAM. How to check pickel footprint memory so that next time I save , I can estimate the memory in advance. – maggs Aug 21 '17 at 09:28
  • I don't know. But since memory usage is not one of Python's strengths, I would just estimate 100% of the usage to be safe. So if the process uses 13G, make sure the machine has 26G+, and you should be fine. – lenz Aug 21 '17 at 10:04
  • @lenz ok. Thanks. Will keep that in mind. – maggs Aug 21 '17 at 10:45

0 Answers0