5

Suppose I have the following 2 arrays:

import numpy as np
a=np.asarray([[1,2,4],
       [3,1,2]])
b=np.asarray([[2,1,1],
       [3,2,3],
       [4,1,2],
       [2,2,1],])

For every row a_row in a, I would like to get the sum of squared difference between a_row and every row in b. The resulted array would be a 2 by 4 array. The expected result would be the following:

array([[ 11.,   5.,  14.,  10.],
       [  2.,   2.,   1.,   3.]])

I've already implemented a solution using loop:

c=np.zeros((2,4))
for e in range(a.shape[0]):
    c[e,:] = np.sum(np.square(b-a[e,:]),axis=1)
print c

What I need is a fully vectorized solution, i.e. no loop is required.

Allen Qin
  • 19,507
  • 8
  • 51
  • 67

2 Answers2

4

If you have access to scipy, then you could do:

import scipy
from scipy.spatial.distance import cdist

import numpy as np

a=np.asarray([[1,2,4],
       [3,1,2]])
b=np.asarray([[2,1,1],
       [3,2,3],
       [4,1,2],
       [2,2,1],])

x = cdist(a,b)**2
# print x
# array([[ 11.,   5.,  14.,  10.],
#        [  2.,   2.,   1.,   3.]])

This uses the cdist function which is vectorized and fast. You can possibly get a bit more speed using numba or cython, but it depends on the size of your arrays in practice.

JoshAdel
  • 66,734
  • 27
  • 141
  • 140
  • Thanks Josh. I've tested and it works fine. However, in this case, I need a fully vectorized solution. i.e. can't use Scipy function. – Allen Qin Jun 08 '16 at 18:56
4

Here is a Numpythonic approach, simply by reshaping the b in order to be able to directly subtract the a from it:

>>> np.square(b[:,None] - a).sum(axis=2).T
array([[11,  5, 14, 10],
       [ 2,  2,  1,  3]])
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Hi Kasravand, thanks for your answer. It works with the sample array but I got a memory error in my actual scripts. In my scripts, array a's shape is (500,3072) and b's shape is (5000,3072). I guess it's probably because this method is memory intensive? I didn't get the error using the loop method mentioned in my question. – Allen Qin Jun 08 '16 at 18:54
  • @Allen I suggest 2 way, at first if you aren't dealing with large numbers you can convert your array [type](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html) to a simpler type like `int8`, if it's not possible you can divide your array to shorter arrays and do the operation with them separately then concatenate the result. Here is a good answer http://stackoverflow.com/questions/31268998/how-to-merge-two-large-numpy-arrays-if-slicing-doesnt-resolve-memory-error – Mazdak Jun 08 '16 at 19:31