How can I get the cosine similarity of all elements of an array with all the other elements in the same array using Tensorflow

Question

Given an array of sentence embeddings (arrays of 512) with a shape of (1000000, 512) how do I calculate the cosine similarity of every one of the 1 million sentence embeddings of the array against every other sentence embedding of the array, ideally using tensorflow, so I can try and speed it up with a GPU?

score 4 · Accepted Answer · answered Jun 05 '20 at 09:44

in this way you can calculate the cosine distance

X = np.random.uniform(0,10, (100,512)).astype('float32')
X = tf.constant(X)

def compute_cosine_distances(a, b):

    normalize_a = tf.nn.l2_normalize(a,1)        
    normalize_b = tf.nn.l2_normalize(b,1)
    distance = 1 - tf.matmul(normalize_a, normalize_b, transpose_b=True)

    return distance

compute_cosine_distances(X, X)

which is equal to

from sklearn.metrics.pairwise import pairwise_distances

pairwise_distances(X.numpy(), metric='cosine')

score 1 · Answer 2 · answered Jun 05 '20 at 08:01

1

Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. So, Cosine similarity of array with itself will be -1 always.

import tensorflow as tf
y_true = [[2., 8.], [1., 7.]]
y_pred = [[2., 8.], [1., 7.]]
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
print(cosine_loss(y_true, y_pred).numpy())

output: -1.0000001

answered Jun 05 '20 at 08:01

Varchita Lalwani

87
4

1

Sorry maybe I didn't ask the question correctly. What I want is each element compared against every other element in the array. so given sentence embeddings [a,b,c] I want to know how similar a is to b & c, how similar b is to a & c and how similar c is to a & b – jdoig Jun 05 '20 at 08:22

How can I get the cosine similarity of all elements of an array with all the other elements in the same array using Tensorflow

2 Answers2