5

I'm trying to write a python function that takes a vector(1x128), then finds the most similar column in a large non-sorted matrix (2000x128). This function called ~100000 times in application. There was no problem when I'm working on desktop PC, but it works very slow in Raspberry Pi. Here is my function;

def find_similar_index(a):
    d = []
    norma=np.linalg.norm(a)
    for i in range(0, 1999):
        d.append(np.abs(np.linalg.norm(a - A[:, i]))/norma)
    return np.argmin(d)

Can I improve anything in this function to work faster? Can I use GPU of Raspberry Pi for this kind of computation?

kmario23
  • 57,311
  • 13
  • 161
  • 150
user2431358
  • 55
  • 1
  • 5
  • For what you're trying to obtain this sounds pretty close to the necessary number of calls (you might be able to reduce the number of calls by a small amount, but nothing extreme). I'm not familiar with Python GPU processing, so I can't help you out there. – Z. Bagley Apr 11 '17 at 20:31
  • you should just try to do `@jit` to the function. It uses numba to speed things up. It'll surely make it faster since you call ~ 0.1M times – kmario23 Apr 11 '17 at 20:31
  • If `A` is `(2000x128)`, thus `A[:,i]` would be of length `2000` and since you are doing `a - A[:, i]`, shouldn't `a` be of length `2000` too? – Divakar Apr 11 '17 at 20:49

1 Answers1

5

Here's one approach using broadcasting and np.einsum -

subs = (a[:,None] - A)
sq_dist = np.einsum('ij,ij->j',subs, subs)
min_idx = np.abs(sq_dist).argmin()

Another way to get sq_dist using (a-b)^2 = a^2 + b^2 - 2ab formula -

sq_dist = (A**2).sum(0) + a.dot(a) - 2*a.dot(A) 

With np.einsum, that would boosted to -

sq_dist = np.einsum('ij,ij->j',A,A) + a.dot(a) - 2*a.dot(A)

Also, since the final result that we are interested in just the closest index and also those distances from np.linalg.norm would be positive given real numbers from the input arrays, we can skip np.abs and also skip the scaling the scaling down by norma.

Runtime test

Approaches -

def app0(a,A): # Original approach
    d = []
    for i in range(0, A.shape[1]):
        d.append(np.linalg.norm(a - A[:, i]))
    return np.argmin(d)

def app1(a,A):
    subs = (a[:,None] - A)
    sq_dist = np.einsum('ij,ij->j',subs, subs)
    return sq_dist.argmin()

def app2(a,A):
    sq_dist = (A**2).sum(0) + a.dot(a) - 2*a.dot(A) 
    return sq_dist.argmin()

def app3(a,A):
    sq_dist = np.einsum('ij,ij->j',A,A) + a.dot(a) - 2*a.dot(A)
    return sq_dist.argmin()

Since, you mentioned that the vector is of shape (1x128) and you are looking for similar columns in A to that vector, so it seems each column is of length 128 and as such I am assuming that A is shaped (128, 2000). With those assumptions, here's a setup and timings using the listed approaches -

In [194]: A = np.random.rand(128,2000)
     ...: a = np.random.rand(128)
     ...: 

In [195]: %timeit app0(a,A)
100 loops, best of 3: 9.21 ms per loop

In [196]: %timeit app1(a,A)
1000 loops, best of 3: 330 µs per loop

In [197]: %timeit app2(a,A)
1000 loops, best of 3: 287 µs per loop

In [198]: %timeit app3(a,A)
1000 loops, best of 3: 291 µs per loop

In [200]: 9210/287.0 # Speedup number
Out[200]: 32.09059233449477
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Approach 3 is working way faster than I expected. I am really appreciated – user2431358 Apr 11 '17 at 21:31
  • @user2431358 Awesome! Please make sure to make corrections (if any) in the question regarding the queries on the shapes of `a` and `A`. – Divakar Apr 11 '17 at 21:32