Questions tagged [spherical-kmeans]

In spherical k-means, all vectors are normalized, and distance measure is cosine dissimilarity.

In classic k-means we seek to minimize a Euclidean distance between the cluster center and the members of the cluster. The intuition behind this is that the radial distance from the cluster-center to the element location should "have sameness" or "be similar" for all elements of that cluster.

In spherical k-means the idea is to set the center of the cluster such that it makes both uniform and minimal the angle between components. The intuition is like looking at stars - the points should have consistent spacing between each other. That spacing is simpler to quantify as "cosine similarity", but it means there are no "milky-way" galaxies forming large bright swathes across the sky of the data.


Source: Difference between standard and spherical k-means algorithms

9 questions
6
votes
1 answer

opencv: how to clusterize by angle using kmeans()

Question is, how to clusterize pairs of some units by their angle? Problem is that, kmeans operates on the notion of Euclidean space distance and does not know about periodic nature of angles. So to make it work, one needs to translate the angle to…
2
votes
2 answers

Calculating Cosine Similarity in Julia for K-Means

I am making with an implementation of K-means clustering in Julia. Figure out, and implement a modification of k-means that alternatively measure similarity by the angle between vectors. So I assumed that one could use Cosine Similarity for this, I…
Bob Pen
  • 67
  • 6
1
vote
1 answer

How to cluster "text document" with "spherical k-means" using Python?

I have finished implementing the traditional k-means text clustering. However, right now, I need to revise my program to "spherical k-means text clustering" but have not succeeded yet. I've searched for solutions on sites but still cannot revise my…
joyce chiu
  • 49
  • 7
1
vote
0 answers

skmeans not producing silhouette plot in r

I have clustered large dataset(~1m observations and ~200 features) with skmeans and I want to validate the results. The problem is that according to tutorial, skmeans produces object which is suitable for silhouette calculations, but if I follow the…
0
votes
1 answer

Kmeans clustering using TENSORFLOW2

How could I convert a pandas database that contains 47 columns and 99999 lines into a tensors in Tensorflow 2? is the Kmeans algorithm already implemented under TF 2? because the command tf.contrib.factorization.KMeans does not work under TF2 since…
0
votes
1 answer

tensorflow kmeans doesn't seem to take new initial points

I'm finding the best cluster set in my data by getting a result which has the lowest average distance from many k means trials on Tensorflow. But my code doesn't update initial centroids in each trial so all results are same. Here's my code1 -…
cornandme
  • 47
  • 2
  • 7
0
votes
1 answer

The K-Means++ Algorithm - Explain the Choice of the Next Cluster Center

Just like the picture,why not just choose the point 2 as the second point of the cluster?But go to generate a random number bettwen [0,1]? def initialize(X, K):#kmean++ m,n=shape(X) C =…
ileadall42
  • 631
  • 2
  • 7
  • 19
0
votes
1 answer

Kmeans algorithm for k=2 which gives equal cluster size outputs

I am using a modified Lloyd's algorithm for obtaining equal cluster size outputs in kmeans with k=2. Following is the pseudocode: - Randomly choose 2 points as initialization for the 2 clusters (denoted as c1, c2) - Repeat below steps until…
-1
votes
1 answer

Create product clustering based on customer views

I have 1 million rows like this: customer_id product_id_viewed 12345 [756436, 369955, 1244356, 4689667] I want to cluster the products that are typically viewed together into separate clusters based on an aggregate of the customers viewing…
Deetro
  • 1
  • 1