Here is my problem :
let's say my two array are :
import numpy as np
first = np.array(["hello", "hello", "hellllo"])
second = np.array(["hlo", "halo", "alle"])
Now I want to get the matrix of distance between each element of the two arrays
so for example my distance function is :
def diff_len(string1, string2):
return abs(len(string1) - len(string2))
So I I would like to get the matrix :
hello hello hellllo
hlo result1 result2 result3
halo result4 result5 result6
alle result7 result8 result9
So what I did was to compute row by row using vectorize function of Numpy :
vectorize_dist = np.vectorize(diff_len)
first = np.array(["hello", "hello", "hellllo"])
second = np.array(["hlo", "halo", "alle"])
vectorize_dist(first , "hlo")
vectorize_dist(first , "halo")
vectorize_dist(first , "alle")
matrix = np.array([vectorize_dist(first , "hlo"), vectorize_dist(first , "halo"), vectorize_dist(first , "alle")])
matrix
array([[2, 2, 4],
[1, 1, 3],
[1, 1, 3]])
But in order to get my matrix I need to execute a loop to compute row after row, but I would like to get the matrix at once. Indeed my two arrays could be very large and executing a loop could take too much time. also I have multiple distance to compute so I would have to execute the procedure multiple time which will be even more time consuming.