`np.dot` without cartesian product on remaining axes

Question

For N dimensions dot is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

I would like to compute the sum product over the last axis of a and the second-to-last of b but without forming the Cartesian product over the remaining axes because the remaining axes are of the same shape. Let me illustrate with an example:

a = np.random.normal(size=(11, 12, 13))
b = np.random.normal(size=(11, 12, 13, 13))
c = np.dot(a, b)
c.shape # = (11, 12, 11, 12, 13)

But I would like the shape to be (11, 12, 13). The desired effect can be achieved using broadcasting

c = np.sum(a[..., None] * b, axis=-2)
c.shape # = (11, 12, 13)

But my arrays are relatively large and I'd like to use the power of parallelised BLAS implementations which don't appear to be supported by np.sum but are supported by np.dot. Any ideas on how to achieve this?

score 3 · Accepted Answer · answered May 05 '16 at 09:20

3

You can use np.einsum -

c = np.einsum('ijk,ijkl->ijl',a,b)

answered May 05 '16 at 09:20

Divakar

218,885
19
262
358

I really like `einsum` but it doesn't parallelise using BLAS. – Till Hoffmann May 05 '16 at 09:22
2

@TillHoffmann Well, AFAIK, dot based funcs like `np.dot` and `np.tensordot` that use `blas` won't allow aligned axes, like with `einsum` here we are keeping the first two axes aligned between `a` and `b`. On performance, for the sample datasizes, this `einsum` approach is proving to be `4x+` better :) – Divakar May 05 '16 at 09:28
2

Great, thanks for the clarification. For anyone who's interested, `np.einsum('...i,...ij->...j', a, b)` is agnostic to dimensionality. – Till Hoffmann May 05 '16 at 09:36

ali_m · Answer 2 · 2016-05-05T15:54:51.557

You can also use np.matmul:

c = np.matmul(a[..., None, :], b)[..., 0, :]

This is equivalent to the new @ operator in Python 3.5+:

c = (a[..., None, :] @ b)[..., 0, :]

There's not much difference in performance - if anything np.einsum seems to be marginally faster for your example arrays:

In [1]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
   ....: np.einsum('...i,...ij->...j', a, b)
   ....: 
The slowest run took 5.24 times longer than the fastest. This could mean that an
intermediate result is being cached.
10000 loops, best of 3: 26.7 µs per loop

In [2]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
np.matmul(a[..., None, :], b)[..., 0, :]
   ....: 
10000 loops, best of 3: 28 µs per loop

`np.dot` without cartesian product on remaining axes

2 Answers2