3

According to the documentation:

For N dimensions dot is a sum product over the last axis of a and the second-to-last of b:

dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

I would like to compute the sum product over the last axis of a and the second-to-last of b but without forming the Cartesian product over the remaining axes because the remaining axes are of the same shape. Let me illustrate with an example:

a = np.random.normal(size=(11, 12, 13))
b = np.random.normal(size=(11, 12, 13, 13))
c = np.dot(a, b)
c.shape # = (11, 12, 11, 12, 13)

But I would like the shape to be (11, 12, 13). The desired effect can be achieved using broadcasting

c = np.sum(a[..., None] * b, axis=-2)
c.shape # = (11, 12, 13)

But my arrays are relatively large and I'd like to use the power of parallelised BLAS implementations which don't appear to be supported by np.sum but are supported by np.dot. Any ideas on how to achieve this?

Till Hoffmann
  • 9,479
  • 6
  • 46
  • 64

2 Answers2

3

You can use np.einsum -

c = np.einsum('ijk,ijkl->ijl',a,b)
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I really like `einsum` but it doesn't parallelise using BLAS. – Till Hoffmann May 05 '16 at 09:22
  • 2
    @TillHoffmann Well, AFAIK, dot based funcs like `np.dot` and `np.tensordot` that use `blas` won't allow aligned axes, like with `einsum` here we are keeping the first two axes aligned between `a` and `b`. On performance, for the sample datasizes, this `einsum` approach is proving to be `4x+` better :) – Divakar May 05 '16 at 09:28
  • 2
    Great, thanks for the clarification. For anyone who's interested, `np.einsum('...i,...ij->...j', a, b)` is agnostic to dimensionality. – Till Hoffmann May 05 '16 at 09:36
2

You can also use np.matmul:

c = np.matmul(a[..., None, :], b)[..., 0, :]

This is equivalent to the new @ operator in Python 3.5+:

c = (a[..., None, :] @ b)[..., 0, :]

There's not much difference in performance - if anything np.einsum seems to be marginally faster for your example arrays:

In [1]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
   ....: np.einsum('...i,...ij->...j', a, b)
   ....: 
The slowest run took 5.24 times longer than the fastest. This could mean that an
intermediate result is being cached.
10000 loops, best of 3: 26.7 µs per loop

In [2]: %%timeit a = np.random.randn(11, 12, 13); b = np.random.randn(11, 12, 13, 13)
np.matmul(a[..., None, :], b)[..., 0, :]
   ....: 
10000 loops, best of 3: 28 µs per loop
ali_m
  • 71,714
  • 23
  • 223
  • 298