0

I'm trying to perform spectral embedding/clustering using Normalized Cuts. I wrote the following code but I have stuck to a logical bottleneck. What do I have to do after clustering the eigenvectors? I don't know how to form the clusters on my original dataset. (A is my affinity matrix)

D = np.diag(np.sum(A, 0))
D_half_inv = np.diag(1.0 / np.sqrt(np.sum(A, 0)))
M = np.dot(D_half_inv, np.dot((D - A), D_half_inv))
# compute eigenvectors and eigenvalues
(w, v) = np.linalg.eigh(M) 
# renorm eigenvectors to have norm 1
var = len(w)
v1 = np.array(np.zeros((var, var)))
for j in range(var):
    v[:][j] = v[:][j]/np.sqrt(np.sum(A,0))
    v[:][j] = v[:][j]/np.linalg.norm(v1[:][j])
v_trailing = v[:,1:45] #omit the corresponding eigenvector of the smallest eigenvalue     which is 0  and 45 is my embedding dimension
k  = 20 #number of clusters
centroids,idx = kmeans2(v_trailing, k)

After that, i get labels for each eigenvector. But how can i link these labels on my original dataset?

azal
  • 1,210
  • 6
  • 23
  • 43

1 Answers1

0

The output mapping to the original dataset corresponds to the indices of the labels in your modified set.

So if yi is in Cm then the ith entry of A will be in Am

or to put it another way

Let C1 ..... CM be the set of clusters generated by clustering the eigenvectors the clusters you want are : A1 ..... AM where Ai= { j | yj element of Ci }

bearrito
  • 2,217
  • 1
  • 25
  • 36
  • this applies to my affinity matrix or my original data where the affinity matrix came from? – azal Nov 08 '14 at 16:32
  • This correpsonds to the affinity matrix. This is s really helpful guide to these techniques : http://www.cs.columbia.edu/~jebara/4772/papers/Luxburg07_tutorial.pdf – bearrito Nov 08 '14 at 16:34
  • actually i have read that, but i still cannot understand some things. So, after i obtain the clusters, i have to rearrange my affinity matrix based on the labels ? – azal Nov 08 '14 at 16:39
  • I'm not following why you would think you need to rearrange the matrix. The indices just indicate which rows in the similarity matrix are in the same cluster. – bearrito Nov 08 '14 at 16:47
  • yes, but i need to separate my original data into clusters and plot them. how can i do that? Should i create a for in range(num_clusters) and separate to num_clusters my original data? – azal Nov 08 '14 at 16:59
  • Well the row indices of the similarity matrix are just the indices of the original data matrix so if you know what the cluster indices are in terms of the similarity matrix then you can map that onto your original data set. In other words you could loop over the indices in your clusters pull those records from the orginal data matrix and plot those. – bearrito Nov 08 '14 at 17:51
  • So, basically, if my original data are a list of 1000 items and i have 20 clusters. I have to create 20 lists (clusters) splitting my original 1000 items? – azal Nov 08 '14 at 17:54
  • Yes, that is the general idea. There might be a more efficient way to plot using some sort of index math so that you don't have to make copies of the data but the main idea is sound. – bearrito Nov 08 '14 at 18:06
  • That means that instead of 1 list with 1000 items, I will create k lists with their sum of items equal to 1000. – azal Nov 08 '14 at 18:29
  • Thats correct the union of the lists should be the original dataset but no list should share any element (record) – bearrito Nov 08 '14 at 18:35