2

I'm trying to find a way to avoid a double for loop in python by using numpy, but I'm not sure if it's even possible.

I have two 3D matrices that I've flatten into two 2D matrices using numpy's flatten(), now I need to do a calculation with each row, to every row. Basically each row represents an image, and I'm doing a series of calculations to two vectors, and returning a scalar.

A [a, b, c, d]     A' [a, b, c, d]
B [e, f, g, h]     B' [e, f, g, h]
C [i, j, k, l]     C' [i, j, k, l]
D [m, n, o, p]     D' [m, n, o, p]

result
[AA' AB' AC' AD']
[BA' BB' BC' BD']
[CA' CB' CC' CD']
[DA' DB' DC' DD']

EDIT: Here's my double for loop

aMatrix = np.array([[5, 3, 2, 1, 4, 2],
        [7, 0, 3, 5, 7, 9],
        [9, 8, 0, 2, 4, 8],
        [3, 5, 2, 0, 1, 9],
        [7, 7, 4, 1, 7, 6],
        [5, 9, 8, 9, 6, 1]])

find_distance_of_two_sets(aMatrix, aMatrix)

def find_distance_of_two_sets(aMatrix, bMatrix):
    distance = np.zeros((6, 6))
    i = 0
    for a in aMatrix:
        j = 0
        for b in bMatrix:
            distance[i][j] = euclidean_distance(a, b)
            j += 1
        i += 1
outputFile = open('distanceMatrix', 'wb')
np.save(outputFile, distance)

def euclidean_distance(a, b):
    return np.sqrt(np.sum(np.square(np.subtract(a, b))))

and if you were to print the result it would be

[[ 0.          9.38083152  9.05538514  8.18535277  7.         11.87434209]
 [ 9.38083152  0.          9.79795897 10.14889157  8.66025404 13.82027496]
 [ 9.05538514  9.79795897  0.          7.93725393  5.91607978 13.52774926]
 [ 8.18535277 10.14889157  7.93725393  0.          8.36660027 15.03329638]
 [ 7.          8.66025404  5.91607978  8.36660027  0.         10.67707825]
 [11.87434209 13.82027496 13.52774926 15.03329638 10.67707825  0.        ]]
Yitzak Hernandez
  • 355
  • 4
  • 23

1 Answers1

2

Broadcast the second array, making use of vectorized operations.

Setup

a = np.array([[5, 3, 2, 1, 4, 2],
        [7, 0, 3, 5, 7, 9],
        [9, 8, 0, 2, 4, 8],
        [3, 5, 2, 0, 1, 9],
        [7, 7, 4, 1, 7, 6],
        [5, 9, 8, 9, 6, 1]])

d = (a - a[:, None])**2
np.sqrt(d.sum(-1)).round(2)

array([[ 0.  ,  9.38,  9.06,  8.19,  7.  , 11.87],
       [ 9.38,  0.  ,  9.8 , 10.15,  8.66, 13.82],
       [ 9.06,  9.8 ,  0.  ,  7.94,  5.92, 13.53],
       [ 8.19, 10.15,  7.94,  0.  ,  8.37, 15.03],
       [ 7.  ,  8.66,  5.92,  8.37,  0.  , 10.68],
       [11.87, 13.82, 13.53, 15.03, 10.68,  0.  ]])

Performance

a = np.random.rand(100, 100)

%%timeit
d = (a - a[:, None])**2
np.sqrt(d.sum(-1)).round(2)

7.68 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
distance = np.zeros((100, 100))
for i, el1 in enumerate(a):
     for j, el2 in enumerate(a):
         distance[i][j] = np.sqrt(np.sum(np.square(np.subtract(el1, el2))))

51.1 ms ± 1.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
user3483203
  • 50,081
  • 9
  • 65
  • 94