0

I want to be able to calculate the cosine distance between row vectors using MXNet. Additionally I am working with batches of samples, and would like to calculate the cosine distance for each pair of samples (i.e. cosine distance of 1st row vector of batch #1 with 1st row vector of batch #2).

Cosine distance between two vectors is defined as in scipy.spatial.distance.cosine:

enter image description here

Thom Lane
  • 993
  • 9
  • 9

1 Answers1

1

You can use mx.nd.batch_dot to perform this batch-wise cosine distance:

import mxnet as mx

def batch_cosine_dist(a, b):
    a1 = mx.nd.expand_dims(a, axis=1)
    b1 = mx.nd.expand_dims(b, axis=2)
    d = mx.nd.batch_dot(a1, b1)[:,0,0]
    a_norm = mx.nd.sqrt(mx.nd.sum((a*a), axis=1))
    b_norm = mx.nd.sqrt(mx.nd.sum((b*b), axis=1))
    dist = 1.0 - d / (a_norm * b_norm)
    return dist

And it will return an array with batch_size number of distances.

batch_size = 3
dim = 2
a = mx.random.uniform(shape=(batch_size, dim))
b = mx.random.uniform(shape=(batch_size, dim))
dist = batch_cosine_dist(a, b)
print(dist.asnumpy())

# [ 0.04385382  0.25792354  0.10448891]
Thom Lane
  • 993
  • 9
  • 9
  • This code is wrong. Indeed, you need to check for division by zero or you will get NaN values... Else, you can approximate the cosine by adding a small epsilon to the norm. – Marsu_ Dec 02 '19 at 21:55