7

I have a pandas dataframe with 100 rows and 10,000 features. I want to fit hierarchical clustering on my data by using pearson correlation as the affinity argument in sklearn.cluster.FeatureAgglomeration.

I've tried two ways to make it work so far: The first is:

feature_agglomator = FeatureAgglomeration(n_clusters=10, affinity=np.corrcoef, linkage='average')

The second one:

from scipy.spatial.distance import correlation 
feature_agglomator = FeatureAgglomeration(n_clusters=10,affinity='correlation', linkage='average')

After running:

feature_agglomator.fit_transform(X)

Both ended with the same exception:

ValueError: The condensed distance matrix must contain only finite values.

What can I do for it to work propery?

Bruce Kent
  • 73
  • 1
  • 6
  • 1
    I think you should read these two github threads related to your issue: [link]https://github.com/scikit-learn/scikit-learn/issues/7689 [link]https://github.com/scikit-learn/scikit-learn/issues/10076 Both seem to point to point to scipy refusing to perform agglomerative clustering when using cosine distance with zero vectors. – d_kennetz Aug 14 '18 at 19:13
  • 3
    I think that the correlation is giving you NaN. Check out your input values. – Norhther Aug 14 '18 at 19:17
  • 1
    @Norhther you were right, I had columns filled with 0's. Thanks! – Bruce Kent Aug 15 '18 at 08:36

0 Answers0