7

I want to multiply B = A @ A.T in numpy. Obviously, the answer would be a symmetric matrix (i.e. B[i, j] == B[j, i]).

However, it is not clear to me how to leverage this easily to cut the computation time down in half (by only computing the lower triangle of B and then using that to get the upper triangle for free).

Is there a way to perform this optimally?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
DankMasterDan
  • 1,900
  • 4
  • 23
  • 35
  • 1
    Search the BLAS or other optimized libraries for such a function, and link it to your code with `cython` or other such tool. It's too specialized for existing `numpy` functions. – hpaulj Jun 07 '18 at 04:58
  • @hpaulj. But not for scipy perhaps? – Mad Physicist Jun 07 '18 at 05:07
  • Actually, numpy does it for you, see [this](https://stackoverflow.com/a/43454451/7207392) post. – Paul Panzer Jun 07 '18 at 06:19
  • @PaulPanzer, how'd you find that! To think I commented on the topic, and didn't remember :( Memory must be going. `dot` is detecting the special case and using a different `BLAS` call. – hpaulj Jun 07 '18 at 06:39
  • @hpaulj _To think I commented on the topic, and didn't remember_ yeah, memory is a funny thing. I, actually did remember because I found it so fascinating at the time. – Paul Panzer Jun 07 '18 at 07:02

2 Answers2

10

As noted in @PaulPanzer's link, dot can detect this case. Here's the timing proof:

In [355]: A = np.random.rand(1000,1000)
In [356]: timeit A.dot(A.T)
57.4 ms ± 960 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [357]: B = A.T.copy()
In [358]: timeit A.dot(B)
98.6 ms ± 805 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Numpy dot too clever about symmetric multiplications

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • With `A = np.random.rand(1000,3); B = A.T.copy()`.I get : `%timeit A.dot(B) : 1000 loops, best of 3: 1.37 ms per loop` and `%timeit A.dot(A.T) : 100 loops, best of 3: 2.2 ms per loop`. Can you reproduce? If so, do you know why transpose version is slower with such a case of small no. of cols? Is `syrk` slower with those? – Divakar Sep 17 '18 at 07:28
2

You can always use sklearns's pairwise_distances

Usage:

from sklearn.metrics.pairwise import pairwise_distances
gram = pairwise_distance(x, metric=metric)

Where metric is a callable or a string defining one of their implemented metrics (full list in the link above)


But, I wrote this for myself a while back so I can share what I did:

import numpy as np

def computeGram(elements, dist):
    n    = len(elements)
    gram = np.zeros([n, n])
    for i in range(n):
        for j in range(i + 1):
            gram[i, j] = dist(elements[i], elements[j])

    upTriIdxs       = np.triu_indices(n)
    gram[upTriIdxs] = gram.T[upTriIdxs]

    return gram

Where dist is a callable, in your case np.inner

Nimrod Morag
  • 938
  • 9
  • 20