Sum the squared difference between 2 Numpy arrays

Question

Suppose I have the following 2 arrays:

import numpy as np
a=np.asarray([[1,2,4],
       [3,1,2]])
b=np.asarray([[2,1,1],
       [3,2,3],
       [4,1,2],
       [2,2,1],])

For every row a_row in a, I would like to get the sum of squared difference between a_row and every row in b. The resulted array would be a 2 by 4 array. The expected result would be the following:

array([[ 11.,   5.,  14.,  10.],
       [  2.,   2.,   1.,   3.]])

I've already implemented a solution using loop:

c=np.zeros((2,4))
for e in range(a.shape[0]):
    c[e,:] = np.sum(np.square(b-a[e,:]),axis=1)
print c

What I need is a fully vectorized solution, i.e. no loop is required.

Seems like a natural for lambdas and closures. – duffymo Jun 07 '16 at 19:55 — duffymo, Jun 07 '16 at 19:55

score 4 · Answer 1 · answered Jun 07 '16 at 19:55

If you have access to scipy, then you could do:

import scipy
from scipy.spatial.distance import cdist

import numpy as np

a=np.asarray([[1,2,4],
       [3,1,2]])
b=np.asarray([[2,1,1],
       [3,2,3],
       [4,1,2],
       [2,2,1],])

x = cdist(a,b)**2
# print x
# array([[ 11.,   5.,  14.,  10.],
#        [  2.,   2.,   1.,   3.]])

This uses the cdist function which is vectorized and fast. You can possibly get a bit more speed using numba or cython, but it depends on the size of your arrays in practice.

Thanks Josh. I've tested and it works fine. However, in this case, I need a fully vectorized solution. i.e. can't use Scipy function. — Allen Qin, Jun 08 '16 at 18:56

score 4 · Accepted Answer · answered Jun 07 '16 at 20:01

4

Here is a Numpythonic approach, simply by reshaping the b in order to be able to directly subtract the a from it:

>>> np.square(b[:,None] - a).sum(axis=2).T
array([[11,  5, 14, 10],
       [ 2,  2,  1,  3]])

answered Jun 07 '16 at 20:01

Mazdak

105,000
18
159
188

Hi Kasravand, thanks for your answer. It works with the sample array but I got a memory error in my actual scripts. In my scripts, array a's shape is (500,3072) and b's shape is (5000,3072). I guess it's probably because this method is memory intensive? I didn't get the error using the loop method mentioned in my question. – Allen Qin Jun 08 '16 at 18:54
@Allen I suggest 2 way, at first if you aren't dealing with large numbers you can convert your array [type](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html) to a simpler type like `int8`, if it's not possible you can divide your array to shorter arrays and do the operation with them separately then concatenate the result. Here is a good answer http://stackoverflow.com/questions/31268998/how-to-merge-two-large-numpy-arrays-if-slicing-doesnt-resolve-memory-error – Mazdak Jun 08 '16 at 19:31

Sum the squared difference between 2 Numpy arrays

2 Answers2