implementation of Hierarchial Agglomerative clustering

Question

i am newbie and just want to implement Hierarchical Agglomerative clustering for RGB images. For this I extract all values of RGB from an image. And I process image.Next I find its distance and then develop the linkage. Now from linkage I want to extract my original data (i.e RGB values) on specified indices with indices id. Here is code I have done so far.

image = Image.open('image.jpg')
image = image.convert('RGB')
im = np.array(image).reshape((-1,3))
rgb = list(im.getdata())
X = pdist(im)
Y = linkage(X)
I = inconsistent(Y)

based on the 4th column of consistency. I opt minimum value of the cutoff in order to get maximum clusters.

cutoff = 0.7
cluster_assignments = fclusterdata(Y, cutoff)
# Print the indices of the data points in each cluster.
num_clusters = cluster_assignments.max()
print "%d clusters" % num_clusters
indices = cluster_indices(cluster_assignments)
ind = np.array(enumerate(rgb))
for k, ind in enumerate(indices):
    print "cluster", k + 1, "is", ind
dendrogram(Y)

I got results like this

cluster 6 is [ 6 11]
cluster 7 is [ 9 12]
cluster 8 is [15]

Means cluster 6 contains the indices of 6 and 11 leafs. Now at this point I stuck in how to map these indices to get original data(i.e rgb values). indices of each rgb values to each pixel in the image. And then I have to generate codebook to implement Agglomeration Clustering. I have no idea how to approach this task. Read a lot of stuff but nothing clued.

I have many questions about your code. **1**: Why reshape image into (-2, 4), what's the mean of -2 and 4? **2**: there is no `getdata()` method for ndarray object. **3**:Why call `fclusterdata()` on the return value of `linkage()`, I think it should be called on `im`. **4**: what's `cluster_indices()` function? — HYRY, Dec 25 '13 at 00:44
1: The original image has shape (4,7,3), but pdist function accept 2-D array. So I reshaped it using rehape(-1,3) to shape it as (28,3). 2: I thought to get all pixel values (RGB values) using getdata(). Which may be used to map with cluster indices but don't know whether it is correct way or not. 3:I read hierarchical clustering example with MATLAB on this link http://www.mathworks.de/de/help/stats/hierarchical-clustering.html. They applied the fclusterdata() on linkage output. — mGm, Dec 25 '13 at 11:51
4: I am not sure about the indices, by writing couple of code lines I just able to get cluster indices based on fclusterdata. The output example 11223441251111 which means fcluster made 5 clusters and similar numbers indicates same cluster. — mGm, Dec 25 '13 at 11:51
im array [[ 54 101 9] [ 67 89 27] [ 67 85 25] [ 55 106 1] [ 52 108 0] [ 55 78 24] [ 19 57 8] [ 19 46 0] [ 95 110 15] [112 159 57] [ 67 118 26] [ 76 127 35] [ 74 128 30] [ 25 62 0] [100 120 9] [127 145 61] [ 48 112 25] [198 25 21] [203 11 10] [127 171 60] [124 173 45] [120 133 19] [109 137 18] [ 60 85 0] [ 37 0 0] [187 47 20] [127 170 52] [ 30 56 0]] — mGm, Dec 25 '13 at 14:12
if you send me your email address I can send you what I am trying to do so. — mGm, Dec 26 '13 at 12:56

score 0 · Answer 1 · answered Dec 27 '13 at 03:40

Here is my solution:

import numpy as np
from scipy.cluster import hierarchy

im = np.array([[54,101,9],[ 67,89,27],[ 67,85,25],[ 55,106,1],[ 52,108,0],
 [ 55,78,24],[ 19,57,8],[ 19,46,0],[ 95,110,15],[112,159,57],
 [ 67,118,26],[ 76,127,35],[ 74,128,30],[ 25,62,0],[100,120,9],
 [127,145,61],[ 48,112,25],[198,25,21],[203,11,10],[127,171,60],
 [124,173,45],[120,133,19],[109,137,18],[ 60,85,0],[ 37,0,0],
 [187,47,20],[127,170,52],[ 30,56,0]])

groups = hierarchy.fclusterdata(im, 0.7)
idx_sorted = np.argsort(groups)
group_sorted = groups[idx_sorted]
im_sorted = im[idx_sorted]
split_idx = np.where(np.diff(group_sorted) != 0)[0] + 1
np.split(im_sorted, split_idx)

output:

[array([[203,  11,  10],
       [198,  25,  21]]),
 array([[187,  47,  20]]),
 array([[127, 171,  60],
       [127, 170,  52]]),
 array([[124, 173,  45]]),
 array([[112, 159,  57]]),
 array([[127, 145,  61]]),
 array([[25, 62,  0],
       [30, 56,  0]]),
 array([[19, 57,  8]]),
 array([[19, 46,  0]]),
 array([[109, 137,  18],
       [120, 133,  19]]),
 array([[100, 120,   9],
       [ 95, 110,  15]]),
 array([[67, 89, 27],
       [67, 85, 25]]),
 array([[55, 78, 24]]),
 array([[ 52, 108,   0],
       [ 55, 106,   1]]),
 array([[ 54, 101,   9]]),
 array([[60, 85,  0]]),
 array([[ 74, 128,  30],
       [ 76, 127,  35]]),
 array([[ 67, 118,  26]]),
 array([[ 48, 112,  25]]),
 array([[37,  0,  0]])]

Thanks for your time to spend on this piece of work. – mGm Dec 27 '13 at 15:46 — mGm, Dec 27 '13 at 15:46

implementation of Hierarchial Agglomerative clustering

1 Answers1