Calculating Cosine Similarity in Julia for K-Means

Question

I am making with an implementation of K-means clustering in Julia.

Figure out, and implement a modification of k-means that alternatively measure similarity by the angle between vectors.

So I assumed that one could use Cosine Similarity for this, I have made the code work with regular K-means by calculating th squared Euclidian Distance, by this:

Distances[:,i] = sum((X.-C[[i],:]).^2, dims=2) # Where C is center, Distances are added using the i-th center

I tried to do this by using cosine similarity such as this:

Distances[:, i] = sum(1 .- ((X*C[[i], :]).^2 /(sum(X.^2, dims=2).*(C[[i],:]'*C[[i],:]))))

But this seems to not be working.

Have I misunderstood the question or am I implementing it wrong?

score 3 · Accepted Answer · answered Feb 09 '21 at 20:44

In my Beta Machine Learning Package, module Utils, I implemented the distances as:

using LinearAlgebra
"""L1 norm distance (aka _Manhattan Distance_)"""
l1_distance(x,y)     = sum(abs.(x-y))
"""Euclidean (L2) distance"""
l2_distance(x,y)     = norm(x-y)
"""Squared Euclidean (L2) distance"""
l2²_distance(x,y)    = norm(x-y)^2
"""Cosine distance"""
cosine_distance(x,y) = dot(x,y)/(norm(x)*norm(y))

I then use them in the cluster module. Note that you need the standard library package LinearAlgebra.

score 1 · Answer 2 · answered Feb 09 '21 at 12:09

I managed to solve it by using the CosineDist function from Distances github. Although one could also manually calculate the distance using the code supplied in the Github or other implementations.

How I did this, was to calculate the distance for each data point to the i-th cluster center.

Distances[:, i] = [evaluate(CosineDist(), X[j,:], C[[i],:]] for j in 1:300] # Or the length of X

Calculating Cosine Similarity in Julia for K-Means

2 Answers2