I am using a modified Lloyd's algorithm for obtaining equal cluster size outputs in kmeans with k=2. Following is the pseudocode:
- Randomly choose 2 points as initialization for the 2 clusters (denoted as c1, c2)
- Repeat below steps until convergence
- Sort all points xi according to ascending values of ||xi-c1|| - ||xi-c2||, i.e. differences in distances to the first and the second cluster
- Put top 50% points in cluster 1 , others in cluster 2
- Recalculate centroids as average of the allocated points (as usual in Lloyd's)
Now the above algorithm is working fine for me empirically:
- It gives balanced clusters
- It always decreases the objective
Has such an algorithm been proposed or analyzed before in literature? Can I get some references please?