How to create an array-like class compatible with NumPy ufuncs?

Question

I'm trying to implement Automatic Differentiation using a class that behaves like a NumPy array. It does not subclass numpy.ndarray, but contains two array attributes. One for the value, and one for the Jacobian matrix. Every operation is overloaded to operate both on the value and the Jacobian. However, I have trouble making NumPy ufuncs (e.g., np.log) work on my custom "array".

I have created the following minimal example, illustrating the issue. Two is supposed to be a radiation hardened version of a NumPy array, which computes everything twice, and makes sure the results are equal.

It must be support indexing, element-wise logarithm, and length. Just like a normal ndarray. The element-wise logarithm works fine when called using x.cos(), but does something unexpected when called using np.cos(x).

from __future__ import print_function
import numpy as np

class Two(object):
    def __init__(self, val1, val2):
        print("init with", val1, val2)
        assert np.array_equal(val1, val2)
        self.val1 = val1
        self.val2 = val2

    def __getitem__(self, s):
        print("getitem", s, "got", Two(self.val1[s], self.val2[s]))
        return Two(self.val1[s], self.val2[s])

    def __repr__(self):
        return "<<{}, {}>>".format(self.val1, self.val2)

    def log(self):
        print("log", self)
        return Two(np.log(self.val1), np.log(self.val2))

    def __len__(self):
        print("len", self, "=", self.val1.shape[0])
        return self.val1.shape[0]

x = Two(np.array([1,2]).T, np.array([1,2]).T)

Indexing returns the relevant elements from both attributes, as expected:

>>> print("First element in x:", x[0], "\n")
init with [1 2] [1 2]
init with 1 1
getitem 0 got <<1, 1>>
init with 1 1
First element in x: <<1, 1>>

Element-wise logarithm works just fine when called using x.cos():

>>> print("--- x.log() ---", x.log(), "\n")
log <<[1 2], [1 2]>>
init with [ 0.  0.69314] [ 0.  0.69314]
--- x.log() --- <<[ 0.  0.69314], [ 0.   0.69314]>>

However, np.log(x) doesn't work as expected. It realizes the object has a length, so it extracts every item and takes the logarithm on each one, then returns an array of Two objects (dtype=object).

>>> print("--- np.log(x) with len ---", np.log(x), "\n") # WTF
len <<[1 2], [1 2]>> = 2
len <<[1 2], [1 2]>> = 2
init with 1 1
getitem 0 got <<1, 1>>
init with 1 1
init with 2 2
getitem 1 got <<2, 2>>
init with 2 2
len <<[1 2], [1 2]>> = 2
len <<[1 2], [1 2]>> = 2
init with 1 1
getitem 0 got <<1, 1>>
init with 1 1
init with 2 2
getitem 1 got <<2, 2>>
init with 2 2
len <<[1 2], [1 2]>> = 2
len <<[1 2], [1 2]>> = 2
init with 1 1
getitem 0 got <<1, 1>>
init with 1 1
init with 2 2
getitem 1 got <<2, 2>>
init with 2 2
log <<1, 1>>
init with 0.0 0.0
log <<2, 2>>
init with 0.693147 0.693147
--- np.log(x) with len --- [<<0.0, 0.0>> <<0.693147, 0.693147>>]

If Two has no length method, it works just fine:

>>> del Two.__len__
>>> print("--- np.log(x) without len ---", np.log(x), "\n")
log <<[1 2], [1 2]>>
init with [ 0.          0.69314718] [ 0.   0.693147]
--- np.log(x) without len --- <<[ 0.   0.693147], [ 0.          0.693147]>>

How can I create a class that meets the requirements (getitem, log, len)? I researched subclassing ndarray, but that seemed to be more complicated than it is worth.

Also, I could not find the location in the NumPy source code where x.__len__ is accessed, so I'm interested in that too.

Edit: I'm using miniconda2 with Python 2.7.11 and NumPy 1.11.0.

Look at how `np.ma` implements masked arrays. It's subclass, but contains 2 arrays, the data and the mask. Most of the code is about creating and coordinating the 2 attributes. Most calculations are performed by compressing or filling the data so regular array methods work. `scipy.sparse` is another example of array like objects that contain arrays. But it implements most of its own calculations, with key ones like matrix product coded in C. — hpaulj, May 30 '16 at 17:59
The details are still being fleshed out, but take a look at [this NEP](http://docs.scipy.org/doc/numpy-dev/neps/ufunc-overrides.html), which tries to solve the exact problem you are describing. I think it is partly implemented in one of the latest numpy releases, although it is experimental and likely to change in the near future. — Jaime, May 30 '16 at 22:54
Thanks. `np.ma` seems incredibly complicated (and is a subclass), but `scipy.sparse` looks better. It does not subclass `ndarray` and avoids the problems above by not implementing `__len__`, but this is probably acceptable for me, since users can use `shape[0]`. If the NEP is implemented all my problems would go away it seems. — roessland, May 31 '16 at 08:36
`/usr/lib/python3/dist-packages/scipy/sparse/base.py` has the code for ` __numpy_ufunc` as described in the NEP. It's because of that that `np.dot(A,A)` works for sparse matrices. — hpaulj, Jun 01 '16 at 01:23

How to create an array-like class compatible with NumPy ufuncs?

0 Answers0