0

My objective is to replicate the functionality of pdist() from SciPy in Julia. I tried using Distances.jl package to perform pairwise computation of distance between observations. However, the results are not same as seen in the below mentioned example.

Python Example:

from scipy.spatial.distance import pdist
a = [[1,2], [3,4], [5,6], [7,8]]
b = pdist(a)
print(b)

output --> array([2.82842712, 5.65685425, 8.48528137, 2.82842712, 5.65685425, 2.82842712])

Julia Example:

using Distances
a = [1 2; 3 4; 5 6; 7 8]
dist_function(x)  = pairwise(Euclidean(), x, dims = 1)
dist_function(a)

output --> 
4×4 Array{Float64,2}:
 0.0      2.82843  5.65685  8.48528
 2.82843  0.0      2.82843  5.65685
 5.65685  2.82843  0.0      2.82843
 8.48528  5.65685  2.82843  0.0

With reference to above examples:

  1. Is pdist() from SciPy in python has metric value set to Euclidean() by default?
  2. How may I approach this problem, to replicate the results in Julia?

Please suggest a solution to resolve this problem.

Documentation reference for pdist() :--> https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

Thanks in advance!!

Mohammad Saad
  • 1,935
  • 10
  • 28

2 Answers2

1

According to the documentation page you linked, to get the same form as Julia from python (yes, I know, this is the reverse of your question), you can pass it to squareform. I.e. in your example, add

from scipy.spatial.distance import squareform
squareform(b)

Also, yes, from the same documentation page, you can see that the 'metric' parameter defaults to 'euclidean' if not explictly defined.

For the reverse situation, simply note that the python vector is simply all the elements in the off-diagonal (since for a 'proper' distance metric, the resulting distance matrix is symmetric).

So you can simply collect all the elements from the off-diagonal into a vector.

Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57
1

For (1), the answer is yes as per the documentation you linked, which says at the top

scipy.spatial.distance.pdist(X, metric='euclidean', *args, **kwargs)

indicating that the metric arg is indeed set to 'euclidean' by default.

I'm not sure I understand your second question - the results are the same? The only difference to me seems to be that scipy returns the upper triangular as a vector, so if it's just about doing this have a look at: https://discourse.julialang.org/t/vector-of-upper-triangle/7764

Nils Gudat
  • 13,222
  • 3
  • 39
  • 60
  • 1
    Thanks for the suggestion. The python example gives only a vector while julia gives the full array, why this happened was explained by `Tasos Papastylianou` in previous post. I wanted to know how may i achieve the output similar to python (in single vector). However, I managed to extract the upper triangle of the array through filter approach from the suggested link. Highly appreciate your help!! – Mohammad Saad Feb 18 '21 at 01:28