I'm a numpy
baby and am looking at using numpy.vectorise()
to compute a distance matrix. I think that a key part of this is the signature
param, but when I run the code below I get an error:
import numpy as np
from scipy.spatial.distance import jaccard
#find jaccard dissimilarities for a constant 1 row * m columns array vs each array in an n rows * m columns nested array, outputting a 1 row * n columns array of dissimilarities
vectorised_compute_jac = np.vectorize(jaccard, signature = '(m),(n,m)->(n)')
array_list = [[1, 2, 3], #arrA
[2, 3, 4], #arrB
[4, 5, 6]] #arrC
distance_matrix = np.array([])
for target_array in array_list:
print (target_array)
print (array_list)
#row should be an array of jac distances between target_array and each array in array_list
row = vectorised_compute_jac(target_array , array_list)
print (row, '\n\n')
#np.vectorise() functions return an array of objects of type specified by otype param, based on docs
np.append(distance_matrix, row)
Output + Error:
[1, 2, 3]
[[1, 2, 3], [2, 3, 4], [4, 5, 6]]
Traceback (most recent call last):
File "C:\Users\u03132tk\.spyder-py3\ModuleMapper\untitled1.py", line 21, in <module>
row = vectorised_compute_jac(array, array_list)
File "C:\ANACONDA3\lib\site-packages\numpy\lib\function_base.py", line 2163, in __call__
return self._vectorize_call(func=func, args=vargs)
File "C:\ANACONDA3\lib\site-packages\numpy\lib\function_base.py", line 2237, in _vectorize_call
res = self._vectorize_call_with_signature(func, args)
File "C:\ANACONDA3\lib\site-packages\numpy\lib\function_base.py", line 2277, in _vectorize_call_with_signature
results = func(*(arg[index] for arg in args))
File "C:\ANACONDA3\lib\site-packages\scipy\spatial\distance.py", line 893, in jaccard
v = _validate_vector(v)
File "C:\ANACONDA3\lib\site-packages\scipy\spatial\distance.py", line 340, in _validate_vector
raise ValueError("Input vector should be 1-D.")
ValueError: Input vector should be 1-D.
What I would like, with square brackets indicating numpy arrays not lists, based on array output types discussed in comments above:
#arrA #arrB #arrC
[[JD(AA), JD(AB), JD(AC)], #arrA
[JD(BA), JD(BB), JD(BC)], #arrB
[JD(CA), JD(CB), JD(CC)]] #arrC
Can someone advise how the signature param works and whether thats causing my woes? I suspect it's due to the (n, m) in my signature as it's the only multi-dimensional thing, hence the question :(
Cheers! Tim