7

I want to be able to save my array subclass to a npy file, and recover the result later.

Something like:

>>> class MyArray(np.ndarray): pass
>>> data = MyArray(np.arange(10))
>>> np.save('fname', data)
>>> data2 = np.load('fname')
>>> assert isinstance(data2, MyArray)  # raises AssertionError

the docs says (emphasis mine):

The format explicitly does not need to:

  • [...]
  • Fully handle arbitrary subclasses of numpy.ndarray. Subclasses will be accepted for writing, but only the array data will be written out. A regular numpy.ndarray object will be created upon reading the file. The API can be used to build a format for a particular subclass, but that is out of scope for the general NPY format.

So is it possible to make the above code not raise an AssertionError?

Eric
  • 95,302
  • 53
  • 242
  • 374
  • Are you asking how to store data in the `npy` file so that numpy knows to user your subclass when reading the data back in (via numpy.load)? Is a solution where you use [view casting](http://docs.scipy.org/doc/numpy/user/basics.subclassing.html#view-casting) _after_ reading the data as a vanilla numpy array OK? – mgilson Aug 08 '16 at 22:12
  • @mgilson: View casting is not quite. I'd like the file to encode what class it should be viewed as, not the programmer. Also, ideally I'd be able to store some metadata of my own corresponding to properties on my class. – Eric Aug 08 '16 at 22:18

1 Answers1

4

I don't see evidence that np.save handles array subclasses.

I tried to save a np.matrix with it, and got back a ndarray.

I tried to save a np.ma array, and got an error

NotImplementedError: MaskedArray.tofile() not implemented yet.

Saving is done by np.lib.npyio.format.write_array, which does

_write_array_header()   # save dtype, shape etc

if dtype is object it uses pickle.dump(array, fp ...)

otherwise it does array.tofile(fp). tofile handles writing the data buffer.

I think pickle.dump of an array ends up using np.save, but I don't recall how that's triggered.

I can for example pickle an array, and load it:

In [657]: f=open('test','wb')
In [658]: pickle.Pickler(f).dump(x)
In [659]: f.close()
In [660]: np.load('test')
In [664]: f=open('test','rb')
In [665]: pickle.load(f)

This pickle dump/load sequence works for test np.ma, np.matrix and sparse.coo_matrix cases. So that's probably the direction to explore for your own subclass.

Searching on numpy and pickle I found Preserve custom attributes when pickling subclass of numpy array. The answer involves a custom .__reduce__ and .__setstate__.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353