how to apply a generic function over numpy rows?

Question

Before you flag this as duplicate, let me explain to you that I read this page and many others and I still haven't found a solution to my problem.

This is the problem I'm having: given two 2D arrays, I want to apply a function F over the two arrays. F takes as input two 1D arrays.

import numpy as np
a = np.arange(15).reshape([3,5])
b = np.arange(30, step=2).reshape([3,5])

# what is the 'numpy' equivalent of the following?
np.array([np.dot(x,y) for x,y in zip(a,b)])

Please note that np.dot is just for demonstration. The real question here is any generic function F that works over two sets of 1D arrays.

vectorizing either fails outright with an error or it applies the function element-by-element, instead of array-by-array (or row-by-row)
np.apply_along_axis applies the function iteratively; for example, using the variables defined above, it does F(a[0], b[0]) and combines this with F(a[0], b[1]) and F(a[0], b[2]). This is not what I'm looking for. Ideally, I would want it to stop at just F(a[0], b[0])
index slicing / advanced slicing doesn't do what I would like either. For one, if I do something like np.dot(a[np.arange(3)], b[np.arange(3)]) this throws a ValueError saying that shapes (3,5) and (3,5) are not aligned. I don't know how to fix this.

I tried to solve this in any way I could, but the only solution I've come up with that works is using list comprehension. But I'm worried about the cost to performance as a result of using list comprehension. I would like to achieve the same effect using a numpy operation, if possible. How do I do this?

If you need to vectorize, we need to see the actual implementation, No shortcuts there. `np.apply_along_axis` is still mostly a loop, it doesn't give you noticeable improvement in most cases. — Divakar, May 29 '17 at 09:55
You have already the inner loop executed by Numpy, are you sure of your need to vectorize (the outer loop)? — gboffi, May 29 '17 at 10:01
the code I wrote is basically the implementation itself. Ok maybe it's not the *actual* implementation, but it's easier to deal with. The actual one just has bigger arrays that are initialized differently. — Manuel, May 29 '17 at 10:05
@gboffi, wouldn't it be better? I used small arrays in my post but this needs to be scaled to much larger arrays, so I'm trying to avoid Python list comprehensions if possible — Manuel, May 29 '17 at 10:07
If it is an inner, outer or mixed product, it can be easily solved using `np.einsum()` but you stated that it's not a dot product so to be able to provide a solution it is _necessary_ to know something about `F`, to know that's a generic function that accepts two 1D arrays of the same shape is vague, or at least it seems to me. Possibly my fault... — gboffi, May 29 '17 at 10:10
Re inner vs outer loops — if your arrays are `(1000000,3)` optimizing the outer loop may be sensible but if your arrays are `(3,1000000)` , well I dare say that you are saving large chunks of almost nothing... — Of course your case can be intermediate, meaning that you need a vectorized solution to understand what's better. — gboffi, May 29 '17 at 10:13

hpaulj · Accepted Answer · 2017-05-29T17:04:18.683

This type of question has been beat to death on SO, but I'll try to illustrate the issues with your framework:

In [1]: a = np.arange(15).reshape([3,5])
   ...: b = np.arange(30, step=2).reshape([3,5])
   ...: 
In [2]: def f(x,y):
   ...:     return np.dot(x,y)

zipped comprehension

The list comprehension approach applies f to the 3 rows of a and b. That is, it iterates on the 2 arrays as through they were lists. At each call, your function gets 2 1d arrays. dot can accept other shapes, but for the moment we'll pretend that it only works with a pair of 1ds

In [3]: np.array([f(x,y) for x,y in zip(a,b)])
Out[3]: array([  60,  510, 1460])
In [4]: np.dot(a[0],b[0])
Out[4]: 60

vectorize/frompyfunc

np.vectorize iterates over the inputs (with broadcasting - which can be handy), and gives the function scalar values. I'll illustrate with frompyfunc returns a object dtype array (and is used by vectorize):

In [5]: vf = np.frompyfunc(f, 2,1)
In [6]: vf(a,b)
Out[6]: 
array([[0, 2, 8, 18, 32],
       [50, 72, 98, 128, 162],
       [200, 242, 288, 338, 392]], dtype=object)

So the result is (3,5) array; incidentally summing across columns gets the desired result

In [9]: vf(a,b).sum(axis=1)
Out[9]: array([60, 510, 1460], dtype=object)

np.vectorize does not make any speed promises.

apply_along_axis

I don't know how you tried to use apply_along_axis. It only takes one array. After a lot of set up it ends up doing (for a 2d array like a):

for i in range(3):
    idx = (i, slice(None))
    outarr[idx] = asanyarray(func1d(arr[idx], *args, **kwargs))

For 3d and larger it makes iteration over the 'other' axes simpler; for 2d it is overkill. In any case it does not speed up the calculations. It is still iteration.

(apply_along_axis takes arr and *args. It iterates on arr, but uses *args whole.).

indexing

np.dot(a[np.arange(3)], b[np.arange(3)])

is the same as

np.dot(a, b)

dot is matrix product, (3,5) works with (5,3) to produce a (3,3). It handles 1d as a special case (see docs), (3,) with (3,) produces (3,).

iteration

For a truly generic f(x,y), your only alternative to the zipped list comprehension is an index loop like this:

In [18]: c = np.zeros((a.shape[0]))
In [19]: for i in range(a.shape[0]):
    ...:    c[i] = f(a[i,:], b[i,:])
In [20]: c
Out[20]: array([   60.,   510.,  1460.])

Speed will be similar. (that action can be moved to compiled code with cython, but I don't think you are ready to dive in that deep.)

As noted in a comment, if the arrays are (N,M), and N is small compared to M, this iteration is not costly. That is, a few loops over a big task are ok. They may even be faster if they simplify large array memory management.

best

The ideal solution is to rewrite the generic function so it works with 2d arrays, using numpy compilied functions.

In the matrix multiplication case, einsum has implemented a generalized form of 'sum-of-products' in compiled code:

In [22]: np.einsum('ij,ij->i',a,b)
Out[22]: array([  60,  510, 1460])

matmul also generalizes the product, but works best with 3d arrays:

In [25]: a[:,None,:]@b[:,:,None]    # needs reshape
Out[25]: 
array([[[  60]],

       [[ 510]],

       [[1460]]])

score 2 · Answer 2 · answered May 29 '17 at 10:30

Stay away from generic functions if you want a fast solution with NumPy. Even though NumPy has some capabilities to hide python loops the loops are still present (inside the function) and these solutions aren't really fast (at least compared to normal NumPy functions).

What you should do is: Find a function in NumPy, SciPy, ... that does what you need. These functions are fast, but it sometimes takes a bit of searching and/or experimentation until you find a match.

For example a vector-dot product is just the sum along rows of the element-wise multiplication:

np.sum(a * b, axis=1)        # array([  60,  510, 1460])

np.einsum('ij,ij->i', a, b)  # array([  60,  510, 1460])