1

Suppose that we have a 64dim matrix to cluster, let's say that the matrix dataset is dt=64x150.

Using from vl_feat's library its kmeans function, I will cluster my dataset to 20 centrers:

[centers, assignments] = vl_kmeans(dt, 20);

centers is a 64x20 matrix.

assignments is a 1x150 matrix with values inside it.

According to manual: The vector assignments contains the (hard) assignments of the input data to the clusters.

I still can not understand what those numbers in the matrix assignments mean. I dont get it at all. Anyone mind helping me a bit here? An example or something would be great. What do these values represent anyway?

Thms
  • 53
  • 1
  • 5
  • 2
    The `assignments` is a vector showing correspondences of your 150 data instances to one of 20 clusters. The range of numbers in `assignment` should go from 1 to 20. – Autonomous Jul 09 '14 at 03:50
  • @ParagS.Chandakkar I liked it very much, the way you described it in just two lines, and I finally realized it. Thanks. – Thms Jul 09 '14 at 16:48

2 Answers2

8

In k-means the problem you are trying to solve is the problem of clustering your 150 points into 20 clusters. Each point is a 64-dimension point and thus represented by a vector of size 64. So in your case dt is the set of points, each column is a 64-dim vector.

After running the algorithm you get centers and assignments. centers are the 20 positions of the cluster's center in a 64-dim space, in case you want to visualize it, measure distances between points and clusters, etc. 'assignments' on the other hand contains the actual assignments of each 64-dim point in dt. So if assignments[7] is 15 it indicates that the 7th vector in dt belongs to the 15th cluster.

For example here you can see clustering of lots of 2d points, let's say 1000 into 3 clusters. In this case dt would be 2x1000, centers would be 2x3 and assignments would be 1x1000 and will hold numbers ranging from 1 to 3 (or 0 to 2, in case you're using openCV)

enter image description here

EDIT: The code to produce this image is located here: http://pypr.sourceforge.net/kmeans.html#k-means-example along with a tutorial on kmeans for pyPR.

zenpoy
  • 19,490
  • 9
  • 60
  • 87
  • +1 maybe just to make it complete add the code that generates the clusters and those charts? – Dan Jul 09 '14 at 06:53
0

In openCV it is the number of the cluster that each of the input points belong to

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263