7

In Gensim's documentation, it says:

You can save trained models to disk and later load them back, either to continue training on new training documents or to transform new documents.

I would like to do this with a dictionary, corpus and tf.idf model. However, the documentation seems to say that it is possible, without explaining how to save these things and load them back up again.

How do you do this?


I've been using Pickle, but don't know if this is right...

import pickle
pickle.dump(tfidf, open("tfidf.p", "wb"))
tfidf_reloaded = pickle.load(open("tfidf.p", "rb"))
Data
  • 689
  • 7
  • 23

3 Answers3

6

In general, you can save things with generic Python pickle, but most gensim models support their own native .save() method.

It takes a target filesystem path, and will save the model more efficiently than pickle() – often by placing large component arrays in separate files, alongside the main file. (When you later move the saved model, keep all these files with the same root name together.)

In particular, some models which have multi-gigabyte subcomponents may not save at all with pickle() – but gensim's native .save() will work.

Models saved with .save() can typically be loaded by using the appropriate class's .load() method. (For example if you've saved a instance of gensim.corpora.dictionary.Dictionary, you'd load it with gensim.corpora.dictionary.Dictionary.load(filepath).

gojomo
  • 52,260
  • 14
  • 86
  • 115
6

Saving the Dict and Corpus to disk

dictionary.save(DICT_PATH)
corpora.MmCorpus.serialize(CORPUS_PATH, corpus)

Loading the Dict and Corpus from disk

loaded_dict = corpora.Dictionary.load(DICT_PATH)
loaded_corp = corpora.MmCorpus(CORPUS_PATH)
BHA Bilel
  • 331
  • 5
  • 14
1

Python default pickle should save all python object. As an example

import pickle

file_name = 'myModel.sav'
pickle.dump(my_model, open(fime_name, 'wb'))
loaded_model = pickle.load(open(file_name, 'rb))
Anwarvic
  • 12,156
  • 4
  • 49
  • 69