0

Is there a way to persist instances of a class in memory or file system in Python? Can I do this with shelve?

The following line is part of this tutorial which takes a long time to execute, and I need to cache it for next program executions.

clf = MultinomialNB().fit(X_train_counts, training_data['targets'])

Type of clf:

>>> type(clf)
<class 'sklearn.naive_bayes.MultinomialNB'>
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
hpn
  • 2,222
  • 2
  • 16
  • 23

1 Answers1

2

Yes, you can use shelve to persist instances of a class. shelve gives you a dictionary interface, making the process relatively transparent.

Underneath, shelve uses the pickle library; if the shelve API doesn't fit your needs, you can go straight to that module.

scikit-learn explicitly support pickle, see Model persistence:

After training a scikit-learn model, it is desirable to have a way to persist the model for future use without having to retrain. The following section gives you an example of how to persist a model with pickle.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Doesn't pickle have issues with some user defined objects? – Jakob Bowyer Sep 20 '14 at 18:26
  • 1
    @JakobBowyer: no, not really. Pickle may have issues with certain types of objects, but it is not specific to user defined objects. – Martijn Pieters Sep 20 '14 at 18:30
  • @MartijnPieters: Thanks. I use `os.path.isfile` to check whether the dump file exist to load or not. Is this the right way? I use joblib which is mentioned in the link. – hpn Sep 20 '14 at 19:16
  • @hpn: you can use `os.path.isfile` or use exception handling (`try:`, `open(...)`, `except IOError: # handle file not being there`). – Martijn Pieters Sep 20 '14 at 19:30