Saving subclass of numpy ndarray after storing it on disk

Question

Suppose I have an array that is an instance of subclassed from np.ndarray class:

class RealisticInfoArray(np.ndarray):
    def __new__(cls, input_array, info=None):
        obj = np.asarray(input_array).view(cls)
        obj.info = info
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.info = getattr(obj, 'info', None)

    def __reduce__(self):
        print('in reduce')
        # Get the parent's __reduce__ tuple
        pickled_state = super(RealisticInfoArray, self).__reduce__()
        # Create our own tuple to pass to __setstate__
        new_state = pickled_state[2] + (self.info,)
        # Return a tuple that replaces the parent's __setstate__ tuple with our own
        return (pickled_state[0], pickled_state[1], new_state)

    def __setstate__(self, state):
        print('in set_state')
        self.info = state[-1]  # Set the info attribute
        # Call the parent's __setstate__ with the other tuple elements.
        super(RealisticInfoArray, self).__setstate__(state[0:-1])
        
    def tofile(self, fid, sep="", format="%s"):
        super().tofile(fid, sep, format)
        print('in tofile')
        
    def tobytes(self, order='C'):
        super().tobytes(order)
        print('in tobytes')

array = RealisticInfoArray(np.zeros((7, 9, 13)), info='tester')

Methods __reduce__, __setstate__, tofile and tobytes are included because I think they are involved in the saving that I want to perform: I want to store the array on disk (via any of the np.save, np.savez, np.savez_compressed) and load it back while preserving the class of that object and all of the custom attributes.

I've already tried the approach of another SO question, but that is not working because I want to use np functions, not pickle or dill. Also, I borrowed the subclass for MWE from there.

Another bit of information is that the actual saving is performed by np.lib.npyio.format.write_array, which does not seem to allow any custom behavior of storing the data.

So, my question is whether it is possible to preserve the class of a stored array and if yes, how to do so?

hpaulj · Answer 1 · 2020-09-27T19:19:15.960

What do existing subclasses do? np.matrix, np.recarray and np.ma.MaskedArray. I have not tested their save or looked to see if they have special code. matrix has the same attributes and data buffer, just methods that restrict dimensions. Masked array has both data and mask array attributes. recarray is like a structured array with a different indexing method.

scipy.sparse has its own save, which uses savez to save its own attributes. That requires its own load too, to recreate the matrix. sparse though is not a subclass.

I may do some exploring myself later.

But don't turn up your nose at pickle. There's a close connection between np.save and pickle. np.save is the pickle method for ndarray,while save uses pickle to serialize object elements.

Matrix subclass is not preserved:

In [52]: np.save('matrix.npy',M)
In [53]: np.load('matrix.npy')
Out[53]: 
array([[1, 2],
       [3, 4]])

pickle preserves the subclass (demo omitted).

np.save isn't implemented for masked arrays:

In [66]: ma = np.ma.masked_array(np.ones(4))
In [67]: ma
Out[67]: 
masked_array(data=[1., 1., 1., 1.],
             mask=False,
       fill_value=1e+20)
In [68]: np.save('masked.npy',ma)
Traceback (most recent call last):
  File "<ipython-input-68-3c39e0b0fc22>", line 1, in <module>
    np.save('masked.npy',ma)
  File "<__array_function__ internals>", line 6, in save
  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py", line 529, in save
    pickle_kwargs=dict(fix_imports=fix_imports))
  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/format.py", line 675, in write_array
    array.tofile(fp)
  File "/usr/local/lib/python3.6/dist-packages/numpy/ma/core.py", line 6116, in tofile
    raise NotImplementedError("MaskedArray.tofile() not implemented yet.")
NotImplementedError: MaskedArray.tofile() not implemented yet.

Again pickle works:

In [69]: with open('masked.pkl','wb') as f:
    ...:     pickle.dump(ma,f)
    ...: 
In [70]: with open('masked.pkl','rb') as f:
    ...:     pp=pickle.load(f)
    ...: 
In [71]: pp
Out[71]: 
masked_array(data=[1.0, 1.0, 1.0, 1.0],
             mask=[False, False, False, False],
       fill_value=1e+20)

Saving subclass of numpy ndarray after storing it on disk

1 Answers1