1

I pickled an instance of a class derived from ndarray, but lose the attributes during pickling/unpickling. Below is simplified code to illustrate the problem. I don't understand:

  1. Why isn't "attrib" included in pickle dump/load? What do I have to do so that it is included?
  2. Why isn't __getstate__() called during dump so that I can add the missing "atrrib"? __setstate__() was called. How was the state gotten to be set? My thought was that I would add "attrib" to the gotten state so that I could later set it.
import numpy as np
import pickle

class Xndarray(np.ndarray):
    def __new__(cls, **kwargs):
        return super().__new__(cls, (5, 3), **kwargs)

    def __init__(self, **kwargs):
        self[...] = -1
        self.attrib = 0

    def add2getstate(self):
        print("add2getstate()", self.__dict__)   

    def __getstate__(self):                         # This never gets called
        print("__getstate__()")
        return super().__getstate__()

    def __setstate__(self, data):
        print("__setstate__()")
        super().__setstate__(data)


if __name__ == "__main__":
    fname = "fname.pkl"

    x = Xndarray()

    x[0] = 0
    x.attrib += 2

    print(x)
    x.add2getstate()
    print(x.attrib)

    with open(fname, "wb") as fh:
        pickle.dump(x, fh)

    print("---------------")

    with open(fname, "rb") as fh:
        y = pickle.load(fh)

    print(y)
    y.add2getstate()
    print(y.attrib)

Here is the output:

[[ 0.  0.  0.]
 [-1. -1. -1.]
 [-1. -1. -1.]
 [-1. -1. -1.]
 [-1. -1. -1.]]
add2getstate() {'attrib': 2}
2
---------------
__setstate__()
[[ 0.  0.  0.]
 [-1. -1. -1.]
 [-1. -1. -1.]
 [-1. -1. -1.]
 [-1. -1. -1.]]
add2getstate() {}
Traceback (most recent call last):
  File "./t.py", line 48, in <module>
    print(y.attrib)
AttributeError: 'Xndarray' object has no attribute 'attrib'
caread
  • 13
  • 5
  • I have additional attributes related to the numpy arrays that I wish to save with the numpy arrays. Additionally, the numpy arrays are part of a larger data structure that is serialized. – caread Feb 01 '19 at 19:55

4 Answers4

2

__getstate__ is only called if your object uses the default __reduce__/__reduce_ex__. numpy.ndarray has its own __reduce__ implementation that does not call your __getstate__.

numpy.ndarray.__reduce__ only includes the object data it knows about, not self.attrib, and numpy.ndarray.__setstate__ would not know how to set self.attrib even if you included that attribute somehow.

You will need to implement your own __reduce__ and __setstate__ and handle self.attrib yourself.

user2357112
  • 260,549
  • 28
  • 431
  • 505
1

Numpy arrays do not implement __getstate__ but __reduce__ .

See https://docs.python.org/2/library/pickle.html#pickling-and-unpickling-extension-types

Mihai Andrei
  • 1,024
  • 8
  • 11
  • Thank you (and also to user2357112). I was able to get my code to work using \_\_reduce\_\_ and changing \_\_setstate\_\_. When I did a search for \_\_reduce\_\_ examples, I found my question already answered in https://stackoverflow.com/questions/26598109/preserve-custom-attributes-when-pickling-subclass-of-numpy-array. – caread Feb 02 '19 at 03:06
0

I don't know why you implemented __getstate__at all. Try to remove __getstate__ and then it should be complete.

There's no self.attribin the pickle, because you implement you own __getstate__ and return just the __getstate__ from the class you inherit.

tbzk
  • 57
  • 1
  • 10
  • I implemented __getstate__ to demonstrate that it was not called during pickle dump. Removing it has no impact on the output. – caread Feb 01 '19 at 19:51
  • Please try this somewhere and show the output `print(super().__getstate__())` – tbzk Feb 01 '19 at 19:56
  • ```Traceback (most recent call last): File "./np_pkl_tst.py", line 41, in print(x.pr()) File "./np_pkl_tst.py", line 18, in pr print(super().__getstate__()) AttributeError: 'super' object has no attribute '__getstate__'``` – caread Feb 01 '19 at 20:00
  • Yeah the return is definitive `False`. This is from python docs: "If __getstate__() returns a false value, the __setstate__() method will not be called upon unpickling." – tbzk Feb 01 '19 at 20:17
0

Use dill instead of pickle https://pypi.org/project/dill/ . As dill extends from pickle the interface pretty much is the same.

import dill
with open(fname, "wb") as fh:
    dill.dump(x,fh)

print("---------------")

with open(fname, "rb") as fh:
    y = dill.load(fh)
SKG
  • 1,432
  • 2
  • 13
  • 23
  • Using dill got my posted test case to work. It didn't work on my full project where I have an instance of the subclassed numpy object assigned to an attribute in a second object, where the second object is pickled. To get that to work I had to use \_\_reduce\_\_ and \_\_setstate\_\_. – caread Feb 02 '19 at 03:13