I am trying to use the Distances
package in Julia to perform a broadcasted computation of distance matrices.
I understand how to calculate a single N x N
distance matrix for some matrix X
(with dimensions D x N
), where each column X[:,i]
stores a D
-dimensional feature vector for observation i
. The code would be:
using Distances
dist_matrix = pairwise(Euclidean(), X, dims = 2)
dist_matrix
contains the Euclidean distances between each pair of D
-dimensional columns, e.g. dist_matrix[m,n]
stores the Euclidean distance between X[:,m]
and X[:,n]
.
Now imagine my array X
is actually an entire tensor or 'volume' of D
-dimensional observations, so that X[:,i,j]
stores the j
-th 'slice' of my D x N
observations. The whole array X
thus has dimensions D x N x T
, where T
is the number of slices.
Accordingly, I want to compute a tensor or 'volume' of distance matrices, so that dist_matrix
will have dimensions N x N x T
.
Is there a way to do this in a single line by broadcasting the pairwise()
function in Julia? What's the fastest way to do this? Below shows the idea with a basic for loop:
using Distances
dist_matrix_tensor = zeros(N,N,T);
for t = 1:T
dist_matrix_tensor[:,:,t] = pairwise(Euclidean(), X[:,:,t], dims = 2)
end
EDIT:
I figured out how to do this using mapslices
, but still not sure if it's the best way.
using Distances
dist_function(x) = pairwise(Euclidean(), x, dims = 2) # define a function that gets the N x N distance matrix for a single 'slice'
dist_matrix_tensor = mapslices(dist_function, X, dims = [1,2]) # map your matrix-operating function across the slices of the main tensor X
This can of course also be parallelized since each 'slice' of X is independent in this computation, so I'm basically just looking for the fastest way to do this. I'm also in general interested in how you would do this with broadcasting specifically.