5

Suppose that we want to compute the outer product of a vector with itself:

import numpy as np

a = np.asarray([1, 1.5, 2, 2.5, 3])
A = np.outer(a, a)

print(A)

which results in:

[[ 1.    1.5   2.    2.5   3.  ]
 [ 1.5   2.25  3.    3.75  4.5 ]
 [ 2.    3.    4.    5.    6.  ]
 [ 2.5   3.75  5.    6.25  7.5 ]
 [ 3.    4.5   6.    7.5   9.  ]]

This results in a symmetric matrix. Prior knowledge of the fact that the two vectors in the outer product are the same could be exploited by only computing one triangle of the matrix, and filling in the remaining entries from the corresponding entries in the triangle.


Question: are there any easy ways to exploit such knowledge in numpy (or other solutions in Python)? Of course, it wouldn't be too difficult to write a custom solution in Python, but if that comes at the cost of not using a solid BLAS it would be unlikely to be worth it.

I am primarily concerned with computation time, not so much necessarily with RAM usage.

Dennis Soemers
  • 8,090
  • 2
  • 32
  • 55
  • What's the dimension of your actual input vector? – John Zwinck Jul 23 '18 at 13:38
  • @JohnZwinck Generally about 25-50ish, roughly that order. Not huge, I do have lots of outer products like this though. Profiling has indicated this is one of the parts worth optimizing if possible (specifically, the `numeric.py:1076(outer)` call does show up quite high) – Dennis Soemers Jul 23 '18 at 13:44
  • For such small sizes, you can use : `np.dot(a[:,None] , a[None])` following - https://stackoverflow.com/questions/43453707/numpy-dot-too-clever-about-symmetric-multiplications. – Divakar Jul 23 '18 at 13:51
  • 3
    We just explored this a couple of days ago, https://stackoverflow.com/q/51440840/901925. In those tests `einsum` had a modest time edge. – hpaulj Jul 23 '18 at 14:05

1 Answers1

4

If you can deal with the result only being valid in the lower triangle, this Numba solution is the best I've come up with:

import numba

@numba.njit                 
def outer(arr):
    n = len(arr)
    res = np.empty((n,n), arr.dtype)
    for ii in range(n):
        for jj in range(ii+1):
            res[ii,jj] = arr[ii] * arr[jj]
    return res

It is twice as fast as np.outer() for large vectors, and five times as fast for small ones. If you need the fully populated solution you can set res[jj,ii] = res[ii,jj] in the inner loop and it is still somewhat faster than np.outer().

For some reason, np.multiply.outer() is faster for small vectors than np.outer() (and no slower for large vectors).

John Zwinck
  • 239,568
  • 38
  • 324
  • 436