4

I have a large set of N d-dimensional vectors (residing in a matrix) that I am lifting by taking the self outer product (i.e. each vector i times itself). For each vector this produces a symmetric matrix with (d+1) choose 2 unique entries. For the entire data, this is an N x d x d tensor. I would like to compute only the unique (d+1) choose 2 entries from the lower diagonal of each tensor slice and store them in a vector. I want to do this with as little memory footprint and as fast as possible in Python - including using C bindings.

If you do this using standard numpy methods, it allocates the entirety of each matrix. This is about double the memory complexity of what is actually required.

For a sense of scale here, consider the case where N = 20k and d = 20k. Then N * d^2 * ~8bytes per element = (2*10^4)^3 * 8 bytes = 64 terabytes.

If we only compute the vectors that encode the unique entries, we have (20001 choose 2) * 20k * 8 = 200010000 * 20000 * 8 bytes = 32 terabytes.

Is there a quick way to do this without resorting to slow methods (such as coding my own outer product in python)?

Edit: I'll note that a similar question was asked in Create array of outer products in numpy

I already know how to compute this using einsum (as in the above question). However, no answer was reached about doing this without the extra (d choose 2) computations and allocations

Edit 2: This thread How to exploit symmetry in outer product in Numpy (or other Python solutions)? asks a related question but does not address the memory complexity. The top answer will still allocate a d x d array for each outer product.

This thread Numpy Performance - Outer Product of a vector with its transpose also addressed computational considerations of a self outer product, but does not reach a memory efficient solution.

Edit 3: If one wants to allocate the whole array and then extract the elements, np.tril_indices or scipy.spatial.distance.squareform will do the trick.

jay
  • 493
  • 1
  • 3
  • 11
  • `scipy.spatial.distance` can work with the `condensed` and `squareform` distance arrays. `squareform` is the 2d redundant version. `condensed` maps the upper or lower triangle values onto a 1d array. – hpaulj Feb 04 '19 at 21:14
  • are you suggesting to factor the outer product so it can be passed to `scipy.spatial.distance.pdist`? this would produce the desired result, but I'm unclear yet how to do that factorization – jay Feb 04 '19 at 21:20
  • 1
    Are you looking to get the unique ones as a 1D array/vector or in a N x d x d tensor with the upper diag elements duplicated from the lower ones? – Divakar Feb 05 '19 at 06:29

1 Answers1

1

Not sure exactly how you want your output, but there's always the option of using Numba:

import numpy as np
import numba as nb

# Computes unique pairwise products
@nb.njit(parallel=True)
def self_outer_unique(a):
    n, d = a.shape
    out = np.empty((n, (d * d + d) // 2), dtype=a.dtype)
    for i in nb.prange(n):
        for j1 in range(d):
            for j2 in range(j1, d):
                idx = j1 * (2 * d - j1 + 1) // 2 + j2 - j1
                out[i, idx] = a[i, j1] * a[i, j2]
    return out

This will give you an array with all the unique products on each row (i.e. the flattened upper triangle of the full output).

import numpy as np

a = np.arange(12).reshape(4, 3)
print(a)
# [[ 0  1  2]
#  [ 3  4  5]
#  [ 6  7  8]
#  [ 9 10 11]]
print(self_outer_unique(a))
# [[  0   0   0   1   2   4]
#  [  9  12  15  16  20  25]
#  [ 36  42  48  49  56  64]
#  [ 81  90  99 100 110 121]]

Performance-wise, it is faster than computing the full outer product with NumPy, although recreating the full array from this takes longer.

import numpy as np

def np_self_outer(a):
    return a[:, :, np.newaxis] * a[:, np.newaxis, :]

def my_self_outer(a):
    b = self_outer_unique(a)
    n, d = a.shape
    b_full = np.zeros((n, d, d), dtype=a.dtype)
    idx0 = np.arange(n)[:, np.newaxis]
    idx1, idx2 = np.triu_indices(d)
    b_full[idx0, idx1, idx2] = b
    b_full += np.triu(b_full, 1).transpose(0, 2, 1)
    return b_full

n, d = 1000, 100
a = np.arange(n * d).reshape(n, d)
print(np.all(np_self_outer(a) == my_self_outer(a)))
# True

%timeit np_self_outer(a)
# 24.6 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit self_outer_unique(a)
# 6.32 ms ± 69.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit my_self_outer(a)
# 124 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
jdehesa
  • 58,456
  • 7
  • 77
  • 121