0

I am trying to calculate euclidean distance between two INDArrays (supposing that INDArray is alternative of numpy array). In python, I have achieved this as:

import numpy as np
from scipy.spatial.distance import cdist
arr1 = np.array(of some size)
arr2 = np.array(of some size)
ans = cdist(arr2 , arr1)

example:

arr1 = [[20.73 62.67 ]   # each row is a vector. so arr1 has 3 2-Dimensional vectors
        [93.47 13.83]
        [50.01 16.60]]

arr2 = [[20.66  6.09]    # arr2 has 2 2-Dimensional vectors
        [51.79 85.14]]

ans =  [[56.57 73.21 31.17]   # distances of vectors of arr2 with arr1
        [38.33 82.59 68.55]]

Please help me achieve this in java. I dont know much about java. So far, I have come to conclusion that Nd4J can do this. But I dont know how.

NOTE: calculating euclidean using for loops is not required. Actually I am trying to see performance impacts of vectorization on euclidean distance calculation. I come to know that Nd4J supports SIMD and vectorization just like Numpy. For details

foobar
  • 571
  • 1
  • 5
  • 20

1 Answers1

1

Transforms.euclideanDistance(a,b) for the distance between same-shape tensors. Or something like this for "along dimension case"

@Test
public void testEuclidean() {
    val arr1 = Nd4j.createFromArray(20.73, 62.67, 93.47, 13.83, 50.01, 16.60).reshape(3, 2);
    val arr2 = Nd4j.createFromArray(20.66, 6.09, 51.79, 85.14).reshape(2, 2);

    val result = Transforms.allEuclideanDistances(arr1, arr2, 1);
    log.info("Result: {}", result);
}

Edit: added code sample for allEuclideanDistances.

raver119
  • 336
  • 1
  • 5
  • Transforms.allEuclideanDistances(a, b, axis) then. – raver119 Apr 26 '20 at 19:04
  • 1
    I've added example for allEuclideanDistance() to the answer, it'll give you the similar values to your python script, but in different order. – raver119 Apr 26 '20 at 19:14
  • this work exactly as python script. arr2 is given prior to arr1 in python example: `Transforms.allEuclideanDistances(arr2, arr1, 1);` – foobar Apr 27 '20 at 16:18
  • can we do this in efficient way? Because this method is taking even more time thank using regular for loops (which is unintuitive). Nd4j was supposed to use "vectorized c++ code for all numerical operations " – foobar Apr 28 '20 at 18:45
  • https://www.javadoc.io/static/org.nd4j/nd4j-api/1.0.0-beta5/org/nd4j/linalg/api/ops/impl/reduce3/EuclideanDistance.html#EuclideanDistance-org.nd4j.linalg.api.ndarray.INDArray-org.nd4j.linalg.api.ndarray.INDArray-int...- will this work better? I dont know how to implement this – foobar Apr 29 '20 at 08:39