1

I'm having trouble getting a NumPy MaskedArray subclass to round-trip through pickle and preserve the extra subclass attributes. Here is an example:

import numpy as np
import cPickle as pickle
from numpy import ma


class SubArray(np.ndarray):
    """Defines a generic np.ndarray subclass, that stores some metadata
    in the  dictionary `info`."""
    def __new__(cls, arr, info={}):
        x = np.asanyarray(arr).view(cls)
        x.info = info
        return x

    def __array_finalize__(self, obj):
        self.info = getattr(obj, 'info', {'ATTR': 'MISSING'})
        return


class MSubArray(SubArray, ma.MaskedArray):
    def __new__(cls, data, info={}, mask=ma.nomask, dtype=None):
        subarr = SubArray(data, info)
        _data = ma.MaskedArray.__new__(cls, data=subarr, mask=mask, dtype=dtype)
        _data.info = subarr.info
        return _data

    def __array_finalize__(self, obj):
        ma.MaskedArray.__array_finalize__(self, obj)
        SubArray.__array_finalize__(self, obj)
        return

ms = MSubArray([1, 2], info={'a': 1})
print('Pre-pickle:', ms.info, ms.data.info)

pkl = pickle.dumps(ms)
ms_from_pkl = pickle.loads(pkl)
print('Post-pickle:', ms_from_pkl.info, ms_from_pkl.data.info)

This produces:

Pre-pickle: {'a': 1} {'a': 1}
Post-pickle: {} {}

Any hints on what I'm doing wrong would be most appreciated!

Tom Aldcroft
  • 2,339
  • 1
  • 14
  • 16
  • Solution and some insight are found in this question http://stackoverflow.com/questions/26598109/preserve-custom-attributes-when-pickling-subclass-of-numpy-array – greedybuddha Nov 15 '16 at 17:24

1 Answers1

1

Take a look at: https://mail.python.org/pipermail/python-list/2011-April/601275.html

The problem your having is the result of pickling C extension types (ndarray sub-types). I'm not an expert on the inner-workings of numpy arrays, but I would guess there's code somewhere which manually handles pickling numpy data. Since your SubArray class is a sub class of ndarray, you would need to over-ride the pickling method(s) used by numpy.

So, AFAIK your options are:

  1. Put a shim on pickle.dumps and pickle.loads (before it gets to the C-level)
  2. Update the numpy pickling methods so that they check for sub-classes.
  3. Make all your classes sub-classes of object and have them contain the appropriate data that you need to be able to pickle / unpickle.

Option # 3 is the way I would go if you're going to stick with using pickle for dumping and loading data. The pickle module was designed with pure-python objects in mind, not extension types.

cronburg
  • 892
  • 1
  • 8
  • 24
  • This example was a much-simplified version of the real thing, which did correctly pickle/unpickle the numpy subclass. The real problem turned out to be in the equivalent of ``MSubArray.__new__``, which in the unpickle process gets passed a ``data`` object of type ``MSubArray`` which does not have the expected ``MSubArray`` attributes. This made it unhappy, but no longer. Thanks for the help that got me moving in the right direction. – Tom Aldcroft Dec 31 '13 at 02:03