Numpy efficient matrix self-multiplication (gram matrix)

Question

I want to multiply B = A @ A.T in numpy. Obviously, the answer would be a symmetric matrix (i.e. B[i, j] == B[j, i]).

However, it is not clear to me how to leverage this easily to cut the computation time down in half (by only computing the lower triangle of B and then using that to get the upper triangle for free).

Is there a way to perform this optimally?

Search the BLAS or other optimized libraries for such a function, and link it to your code with `cython` or other such tool. It's too specialized for existing `numpy` functions. — hpaulj, Jun 07 '18 at 04:58
Actually, numpy does it for you, see [this](https://stackoverflow.com/a/43454451/7207392) post. — Paul Panzer, Jun 07 '18 at 06:19
@PaulPanzer, how'd you find that! To think I commented on the topic, and didn't remember :( Memory must be going. `dot` is detecting the special case and using a different `BLAS` call. — hpaulj, Jun 07 '18 at 06:39
@hpaulj _To think I commented on the topic, and didn't remember_ yeah, memory is a funny thing. I, actually did remember because I found it so fascinating at the time. — Paul Panzer, Jun 07 '18 at 07:02

score 10 · Accepted Answer · answered Jun 07 '18 at 06:35

10

As noted in @PaulPanzer's link, dot can detect this case. Here's the timing proof:

In [355]: A = np.random.rand(1000,1000)
In [356]: timeit A.dot(A.T)
57.4 ms ± 960 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [357]: B = A.T.copy()
In [358]: timeit A.dot(B)
98.6 ms ± 805 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Numpy dot too clever about symmetric multiplications

answered Jun 07 '18 at 06:35

hpaulj

221,503
14
230
353

With `A = np.random.rand(1000,3); B = A.T.copy()`.I get : `%timeit A.dot(B) : 1000 loops, best of 3: 1.37 ms per loop` and `%timeit A.dot(A.T) : 100 loops, best of 3: 2.2 ms per loop`. Can you reproduce? If so, do you know why transpose version is slower with such a case of small no. of cols? Is `syrk` slower with those? – Divakar Sep 17 '18 at 07:28

score 2 · Answer 2 · answered Jun 07 '18 at 05:29

You can always use sklearns's pairwise_distances

Usage:

from sklearn.metrics.pairwise import pairwise_distances
gram = pairwise_distance(x, metric=metric)

Where metric is a callable or a string defining one of their implemented metrics (full list in the link above)

But, I wrote this for myself a while back so I can share what I did:

import numpy as np

def computeGram(elements, dist):
    n    = len(elements)
    gram = np.zeros([n, n])
    for i in range(n):
        for j in range(i + 1):
            gram[i, j] = dist(elements[i], elements[j])

    upTriIdxs       = np.triu_indices(n)
    gram[upTriIdxs] = gram.T[upTriIdxs]

    return gram

Where dist is a callable, in your case np.inner

In my sample tests these calculations are 20x slower than `x.dot(x)`. — hpaulj, Jun 07 '18 at 05:39
@hpaulj yeah it's a bit of an overkill for `dot()`, I wrote this to handle more complicated metrics — Nimrod Morag, Jun 07 '18 at 05:41

Numpy efficient matrix self-multiplication (gram matrix)

2 Answers2

Linked

Related