I am new to both python and scikit-learn so please bear with me.
I took this source code for k means clustering algorithm from k means clustering.
I then modified to run on my local set by using load_file function.
Although the algorithm terminates, but it does not produce any output like which documents are clustered together.
I found that the km object has "km.label" array which lists the centroid id of each document.
It also has the centroid vector with "km.cluster_centers_"
But what document it is ? I have to map it to "dataset" which is a "Bunch" object.
If i print dataset.data[0], i get the data of first file which i think are shuffled. but i just want to know the name.
I am confused with questions like Does the document at dataset.data[0] is clusterd to centoid at km.label[0] ?
My basic problem is to find which files are clustered together. How to find that ?