Okay, @Wes' answer was suggesting to use some good functions for the task, however he used them incorrectly. After some more reading of the documentation, it seems you need a condensed pairwise distance matrix before passing it to the spc.linkage
function, which is the upper-triangular part of the distance matrix, row by row.
It also says that the spc.pdist
function returns a distance matrix in that condensed form. However, the input is NOT a correlation matrix or anything like that. It needs observations and will turn them into the matrix itself given the specified metric.
Now, it will come to no surprise to you that a covariance or correlation matrix already summarizes observations into a matrix. Instead of representing a distance, it represents correlation. Here is where I am unsure of what is mathematically the most sound thing to do, but I believe you could turn this correlation matrix into a distance matrix of some sort by just calculating 1.0 - corr
.
So let's do that:
pdist_uncondensed = 1.0 - corr
pdist_condensed = np.concatenate([row[i+1:] for i, row in enumerate(pdist_uncondensed)])
linkage = spc.linkage(pdist_condensed, method='complete')
idx = spc.fcluster(linkage, 0.5 * pdist_condensed.max(), 'distance')