Testing Numpy operations

Question

Whenever I need to test a moderately complex numpy expression, say,

c = np.multiply.outer(a, b)
d = np.einsum('kjij->ijk', c)

I end up doings hacks such as, e.g., setting a and bthus

a = np.arange(9).reshape(3,3)
b = a / 10

so that I can then track what d contains.

This is ugly and not very convenient. Ideally, I would be able to do something like the following:

a = np.array(list("abcdefghi")).reshape(3,3)
b = np.array(list("ABCDEFGHI")).reshape(3,3)

c = np.add.outer(a, b)
d = np.einsum('kjij->ijk', c)

so that, e.g., d[0, 1, 2] could be seen to correspond to 'hB', which is much clearer than .7 (which is what the other assignment to a and b would give.) This cannot be done, because the ufunc add does not take characters.

In summary, once I start chaining a few transformations (an outer product, an einsum, broadcasting or slicing, etc.) I lose track and need to see for myself what my transformations are actually doing. That's when I need to run a few examples, and that's where my current method of doing so strikes me as suboptimal. Is there any standard, or better, way to do this?

Sorry, you'll explain your problem more. I don't see what's hackie. Accepted numpy SO answers do that kind of test/verification all the time. — hpaulj, May 17 '18 at 10:53
I understand hack implied ugly, but I still don't see what's wrong. Nor what you are trying to do with strings. — hpaulj, May 17 '18 at 11:05
Thanks. I have edited my question; hopefully what I am after will now be clearer. — Schiphol, May 17 '18 at 11:20
I think the answer to your question might be "reading the manual" — Martin Thoma, May 17 '18 at 11:26
Thanks, @MartinThoma. What part of the manual would that be? I have read most of it multiple times. — Schiphol, May 17 '18 at 11:31
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.einsum.html - If I have time after work, I will see if it is unclear and propose to add something. Or maybe you want to do a pull request? — Martin Thoma, May 17 '18 at 11:37
Thanks again :) It's not really that I don't understand `einsum`. I do. It's rather than once I start chaining a few transformations (an outer product, an einsum, broadcasting or slicing, etc.) I loose track and need to see for myself what my transformation are actually doing. That's when I need to run a few examples, and that's where my current method of doing so strikes me as suboptimal. Perhaps making reference to `einsum` in the question is distracting? — Schiphol, May 17 '18 at 12:17
I have added part of my last comment to the question, in the hope that it will be clarificatory. — Schiphol, May 17 '18 at 12:23

hpaulj · Accepted Answer · 2018-05-18T00:00:29.983

In [454]: a = np.array(list("abcdefghi")).reshape(3,3)
     ...: b = np.array(list("ABCDEFGHI")).reshape(3,3)

np.add can't be used because add has not been defined for the string dtype:

In [455]: c = np.add.outer(a,b)
....
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

But np.char has functions that apply Python string methods to ndarray elements (these aren't fast, just convenient):

Signature: np.char.add(x1, x2)
Docstring:
Return element-wise string concatenation for two arrays of str or unicode.

Using broadcasting I can perform your outer string concatenation:

In [457]: c = np.char.add(a[:,:,None,None], b[None,None,:,:])
In [458]: c.shape
Out[458]: (3, 3, 3, 3)
In [459]: c
Out[459]: 
array([[[['aA', 'aB', 'aC'],
         ['aD', 'aE', 'aF'],
         ['aG', 'aH', 'aI']],

        [['bA', 'bB', 'bC'],
         ['bD', 'bE', 'bF'],
         ['bG', 'bH', 'bI']],

        ....
        [['iA', 'iB', 'iC'],
         ['iD', 'iE', 'iF'],
         ['iG', 'iH', 'iI']]]], dtype='<U2')

I was skeptical that einsum could handle this array, since normally einsum is used for np.dot like sum-of-products calculations. But with this indexing, it is just selecting a diagonal and rearranging axes, so it does work:

In [460]: np.einsum('kjij->ijk', c)
Out[460]: 
array([[['aA', 'dA', 'gA'],
        ['bB', 'eB', 'hB'],
        ['cC', 'fC', 'iC']],

       [['aD', 'dD', 'gD'],
        ['bE', 'eE', 'hE'],
        ['cF', 'fF', 'iF']],

       [['aG', 'dG', 'gG'],
        ['bH', 'eH', 'hH'],
        ['cI', 'fI', 'iI']]], dtype='<U2')

The d from the numeric test case:

array([[[0. , 3. , 6. ],
        [1.1, 4.1, 7.1],
        [2.2, 5.2, 8.2]],

       [[0.3, 3.3, 6.3],
        [1.4, 4.4, 7.4],
        [2.5, 5.5, 8.5]],

       [[0.6, 3.6, 6.6],
        [1.7, 4.7, 7.7],
        [2.8, 5.8, 8.8]]])

The pattern with these numeric values is just as clear as with strings.

I like to use distinct array shapes where possible, because it makes tracking dimensions across changes easier:

In [496]: a3 = np.arange(1,13).reshape(4,3)
     ...: b3 = np.arange(1,7).reshape(2,3) / 10
In [497]: c3 = np.add.outer(a3,b3)
In [498]: d3 = np.einsum('kjij->ijk', c3)
In [499]: c3.shape
Out[499]: (4, 3, 2, 3)
In [500]: d3.shape
Out[500]: (2, 3, 4)
In [501]: d3
Out[501]: 
array([[[ 1.1,  4.1,  7.1, 10.1],
        [ 2.2,  5.2,  8.2, 11.2],
        [ 3.3,  6.3,  9.3, 12.3]],

       [[ 1.4,  4.4,  7.4, 10.4],
        [ 2.5,  5.5,  8.5, 11.5],
        [ 3.6,  6.6,  9.6, 12.6]]])

This, for example, would raise an error if I try ''kjik->ijk'.

With numeric values I can perform the multiply.outer with einsum:

In [502]: c4 = np.multiply.outer(a3,b3)
In [503]: np.allclose(c4,np.einsum('ij,kl',a3,b3))
Out[503]: True
In [504]: d4 = np.einsum('kjij->ijk', c4)
In [505]: np.allclose(d4,np.einsum('kj,ij->ijk',a3,b3))
Out[505]: True
In [506]: d4
Out[506]: 
array([[[0.1, 0.4, 0.7, 1. ],
        [0.4, 1. , 1.6, 2.2],
        [0.9, 1.8, 2.7, 3.6]],

       [[0.4, 1.6, 2.8, 4. ],
        [1. , 2.5, 4. , 5.5],
        [1.8, 3.6, 5.4, 7.2]]])

That 'kj,ij->ijk' gives me a better of idea of what is happening than the d display.

Another way to put it:

(4,3) + (2,3) => (2,3,4)

Thanks a lot, that is very useful. You are right that this technique won't work when you actually need to multiply arrays. It would be very neat if there was a way to run numpy operations on symbolic arrays as a way of testing behavior. — Schiphol, May 18 '18 at 07:43
There is a symbolic math module, `sympy`. It can convert its symbolic expressions into `numpy` equivalents, but it can't run `numpy` code. — hpaulj, May 18 '18 at 16:45

Testing Numpy operations

1 Answers1