4

I would like to use something like np.dot or (preferably) np.einsum to efficiently perform their same function but with an alternate ufunc instead of np.multiply. For example, consider these two arrays:

>>> a
array([[0, 1],
       [1, 1],
       [1, 0]])
>>> b
array([[0, 0],
       [1, 0],
       [1, 0],
       [0, 0]])

Now suppose I want to count the number of elements in each row of a equal to the corresponding elements in each row of b. I'd like to be able to do the equivalent of the following (note: the output below is fabricated but the values are what I would expect to see):

>>> np.dot(a, b.T, ufunc=np.equal)
array([[1, 0, 0, 1],
       [0, 1, 1, 0],
       [1, 2, 2, 1]])

Is there a way to do this?

bogatron
  • 18,639
  • 6
  • 53
  • 47

3 Answers3

3

You can use the broadcasting from Divakar's answer together with numexpr:

numexpr.evaluate('sum(1*(a == b), axis=2)', {'a': a[:,None]})

The 1*() is a workaround. I have confirmed this doesn't allocate a big temporary array.

user6758673
  • 579
  • 2
  • 4
2

You could use broadcasting for such match counting problem -

(a[:,None] == b).sum(2)

Sample run -

In [36]: a
Out[36]: 
array([[0, 1],
       [1, 1],
       [1, 0]])

In [37]: b
Out[37]: 
array([[0, 0],
       [1, 0],
       [1, 0],
       [0, 0]])

In [38]: (a[:,None] == b).sum(2)
Out[38]: 
array([[1, 0, 0, 1],
       [0, 1, 1, 0],
       [1, 2, 2, 1]])

If you really want to employ np.einsum and np.equal, here's a way to mold the earlier approach to give us the desired result -

np.einsum('ijk->ij',np.equal(a[:,None],b).astype(int))
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • That does produce the expected result. The one downside is the temporary storage for `(a[:,None] == b)` and `np.equal(a[:,None],b)`. In the example, the intermediate array has shape `(3, 4, 2)` but in practice, the last dimension may be large, which is why I was hoping to avoid it. – bogatron Oct 31 '16 at 21:56
  • @bogatron What would be the typical shapes of `a`and `b` for the actual use case? – Divakar Oct 31 '16 at 21:57
  • The first dimension of each array is around 10**5. The second dimension (over which the summation occurs) would be either 64 or 128. – bogatron Oct 31 '16 at 23:00
  • @bogatron Well the boolean array of matches `a[:,None] == b` because of being boolean would be `8x` more efficient than int/float arrays. Even then if you get into memory issue, we could run a loop through the cols and do something like a[:,i,None]==b[:,i] and accumulate at each iteration. Other than those, I am out of ideas. – Divakar Oct 31 '16 at 23:07
  • 1
    @Divakar - numexpr ;-) But you have to use `1*(a[:,None] == b)`, as [mentioned here](https://github.com/pydata/numexpr/issues/96). – user6758673 Oct 31 '16 at 23:10
  • @user6758673 Good suggestion actually! That might speedup things specially on large arrays. OP should take note of it. – Divakar Oct 31 '16 at 23:14
  • @Divakar - posted as a new answer – user6758673 Oct 31 '16 at 23:43
2

There's an old issue on the numpy github asking for a generalization of einsum that would allow the use of other functions. The current version just implements a sum of products. As far as I know, no one has taken on that project.

Several years ago I patched einsum, fixing the handling of the '...' notation. So I have a good idea of how it is implemented; and could probably adapt my Python/cython emulator to add this feature. The actual einsum code is written in c.

My guess is that if you don't like Divakar's approach, you'll have to write your own version with cython.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    The accepted answer does what I need but it would be great to see `einsum` support this feature. I've found it to run much faster than alternate solutions for numerous operations. – bogatron Nov 01 '16 at 15:43
  • I was wondering myself, searching for this feature. It would be awesome if we could reuse the indexing machinery and flexible notation of einsum (and einops) AND also specify arbitrary (I guess commutative and associative ) binary operations instead of multiply for the terms and sum for the reduction. The thing I tried to implement was for example a meshgrid. What would be cool, would be to get each term in an einsum expression as another dimension in a resulting tensor. – Aleks J May 27 '22 at 08:21