0

I'm hoping someone can help me debug an issue we're seeing with subclassed ndarrays in spark. Specifically when broadcast a subclassed array it seems to lose the extra information. A trivial example is below:

>>> import numpy as np
>>> 
>>> class Test(np.ndarray):
...     def __new__(cls, input_array, info=None):
...         obj = np.asarray(input_array).view(cls)
...         obj.info = info
...         return obj
...     
...     def __array_finalize__(self, obj):
...         if not hasattr(self, "info"):
...             self.info = getattr(obj, 'info', None)
...         else:
...             print("has info attribute: %s" % getattr(self, 'info'))
... 
>>> test = Test(np.array([[1,2,3],[4,5,6]]), info="info")
>>> print(test.info)
info
>>> print(sc.broadcast(test).value)
[[1 2 3]
 [4 5 6]]
>>> print(sc.broadcast(test).value.info)
None
David
  • 9,284
  • 3
  • 41
  • 40
  • This thread solved it: http://stackoverflow.com/questions/26598109/preserve-custom-attributes-when-pickling-subclass-of-numpy-array – David Apr 04 '17 at 03:11

1 Answers1

0

At a minimum, you have a small typo - you're checking for hasattr(obj, "info") when instead you should be checking if hasattr(self, "info"). Because of the if statement flip, info isn't being carried over.

test = Test(np.array([[1,2,3],[4,5,6]]), info="info")
print test.info # info
test2 = test[1:]
print test2.info # info
Jeff Tratner
  • 16,270
  • 4
  • 47
  • 67