1

I have a multiple linear regression model from the statsmodels and I want to save this model and then use it in a different python script. In looking online it seems that the best way to do this is with cPickle. However, I seem to be getting a Memory error when I try and save this model as a pickle file and an EOFerror when I try and load the cPickled model. I have used cPickle earlier in my code to pickle a list of strings and when I try to load that in my next python script it works just fine. I am not sure why when I try using the same method with my statsmodel that it won't work but the list does. Below is some snippets of my code:

In Python script #1:

import cPickle
import import statsmodels.formula.api as smf

selected = ['name1',....,'name20']
model = smf.ols(formula, data).fit()

cPickle.dump(selected, open('predictors.p', 'w'))
cPickle.dump(model, open('model.p', 'w')) # Here I get a MemoryError

In Python script #2:

predictors = cPickle.load(open('predictors.p','r')) # This works 
model = cPickle.load(open('model.p','r')) # This results in an EOFerror


Traceback (most recent call last):
File "MOS_predictor_test.py", line 305, in <module>
model = cPickle.load(open('model.p','r'))
EOFError

How can I save this model and use it in a shorter script? It seems as though cPickle may not be able to store this model because it doesn't have enough memory. Is there something else I can use beside cPickle that will allow me to do what I am trying to do?

HM14
  • 689
  • 1
  • 10
  • 30
  • you could try the save and load method of the results instance. There is a problem with pickling when a formula is used, because the formula `design_info` cannot be pickled. statsmodels has an imperfect workaround and removes design_info before pickling. (But I don't think that's an answer here, because dump should use the same getstate.) – Josef Apr 21 '17 at 01:09

0 Answers0