3

I am writing a class that does some data processing. I initiated the class and did some processing then pickled the results. the problem is that I added new methods to the class and loaded the pickled objects but I couldn't apply the new methods to the unpickled ones.

For example, This is my class

class Play(ABC):
  def __init__(self, data):
    self.data = data
    
  @abstractmethod
  def method(self):
    pass

class M(Play):

  def __init__(self, data):
    super().__init__(data)

  def method(self):
    self.corr = np.corrcoef(self.data)

I did some processing

mat = np.random.rand(102, 641)
s = M(mat)
s.method()

and pickled them using dill

def save_object(obj, filename):
    with open(filename, 'wb') as output:  
        dill.dump(obj, output, dill.HIGHEST_PROTOCOL)
save_object(s, 'file.pkl')  

then, I added new methods to the class and unpickled the file to apply those methods but I couldn't

class Play(ABC):
  def __init__(self, data):
    self.data = data
    
  @abstractmethod
  def method(self):
    pass

class M(Play):

  def __init__(self, data):
    super().__init__(data)

  def method(self):
    self.corr = np.corrcoef(self.data)
  
  def new(self):
    # pass
    self.q = self.corr.shape
def load_object(filename):
  with open(filename, 'rb') as input:
    obj = dill.load(input)
    return obj
obj = load_object('file.pkl')
obj.new()

and I get this result

AttributeError: 'M' object has no attribute 'new'

How can I fix that?

Ahmed
  • 43
  • 3
  • It's because `dill` saved the original class definition. – martineau Jul 14 '21 at 01:52
  • why don't you just load file first, then define/add new methods? – Lei Yang Jul 14 '21 at 02:05
  • @LeiYang how can I do so? – Ahmed Jul 14 '21 at 02:22
  • in my opinion, data is just data, don't mix with the classes' states. pikle only promise you get same thing you write. that's enough. others is totally business logic and design patterns. you can state your final goal and let's come up with other solutions. – Lei Yang Jul 14 '21 at 02:27
  • I agree with you. my goal is to save the current version of the data for further processing and avoiding starting from scratch as it is very expensive computationally. I can think about writing the whole class first then applying it but I think It is not elegant @LeiYang – Ahmed Jul 14 '21 at 02:30
  • the design should depend on how the data looks like. you have a lot of options such as csv, json, even database. – Lei Yang Jul 14 '21 at 02:37
  • @ahmed Another possible option for you could be to use pytorch's `nn.Module` as the parent class that you inherit from and then just extend the class to write your own. The benefit is, you could use the natively available statedict saving functionality to save your class and it's data together. see here: https://pytorch.org/tutorials/beginner/saving_loading_models.html. If you would like some demo class on this one and how to save it, let me know and I will share an example here. – CypherX Jul 14 '21 at 02:48
  • @CypherX this is a good solution, if you can share an example it would be very helpful. Thanks a lot – Ahmed Jul 14 '21 at 10:59

2 Answers2

2

I'm the dill author. You may want to look at the different pickling settings (see dill.settings). For example:

>>> class Foo(object):
...   x = 1
...   def bar(self, y):
...     return y + self.x
... 
>>> import dill
>>> f = Foo()
>>> s = dill.dumps(f)
>>> f.bar(5)
6
>>> 
>>> class Foo(object):
...   x = 10
...   def bar(self, y):
...     return y + self.x**2
... 
>>> g = dill.loads(s)
>>> g.bar(5)
105
>>> g = dill.loads(s, ignore=True)
>>> g.bar(5)
6
>>> dill.settings
{'protocol': 4, 'byref': False, 'fmode': 0, 'recurse': False, 'ignore': False}
>>> 

here, ignore=True on load tells dill to ignore an existing class definition if a newer one exists. The other settings are to be used on dump/dumps, where for example, byref=True tells dill to not store the class definition at all -- just use whatever references is available in the unpicking environment.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
0

An alternative for you is to use joblib library. I will also encourage you to check the documentation for joblib shared in the References section below.

joblib.dump(obj, "filename.joblib")    ## Saving object to a file
obj = joblib.load("filename.joblib")   ## Loading object from a file

Save and Load your object: 1st time

class Foo(object):
    x = 1

    def bar(self, y):
        return y + self.x

f1 = Foo()

def checkFoo(foo: Foo):
    print(f'version: {foo.version}')

## save your object
>>> version = '0.0.1'
>>> f1.version = version
>>> checkFoo(f1)
'0.0.1'
>>> joblib.dump(f1, f'object_store_v{version}.joblib')
['object_store_v0.0.1.joblib']

## load saved object (latest version)
>>> version = '0.0.1'
>>> f2 = joblib.load(f'object_store_v{version}.joblib')
>>> checkFoo(f2)
'0.0.1'

Save and Load your object: 2nd time (after modification)

## Modify object and save new version
>>> version = '0.0.2'
>>> f2.version = version
>>> checkFoo(f2)
'0.0.2'
>>> joblib.dump(f2, f'object_store_v{version}.joblib')
['object_store_v0.0.2.joblib']

## load saved object (latest version)
>>> version = '0.0.2'
>>> f3 = joblib.load(f'object_store_v{version}.joblib')
>>> checkFoo(f3)
'0.0.2'

References

CypherX
  • 7,019
  • 3
  • 25
  • 37
  • 1
    @Ahmed Since, it seems that your requirement was solved using the other answer, I am not posting the hack I talked about using `pytorch.nn.Module`. However, there is another simpler alternative that could work for you right out of the box: using `joblib` library. I hope this helps. – CypherX Jul 15 '21 at 17:20
  • 3
    `joblib` is not a serialization library, it's for caching to memory or disk. `joblib`is built on `cloudpickle`, which is similar to `dill`. For completeness, there's a caching package called`klepto` which is built on `dill`, and is similar to `joblib`. Unfortunately there's not a unified solution. – Mike McKerns Jul 16 '21 at 00:30
  • @ahmed Thank you, for accepting this answer. Please also consider **voting it up**. – CypherX Jul 26 '21 at 07:10