10

I want to do hierarchical clustering with the fastcluster module. When i the default (euclidian) distance metric, it works fine:

import fastcluster
import scipy.cluster.hierarchy
distance = spatial.distance.pdist(data)
linkage = fastcluster.linkage(distance,method="complete")

But the problem is when I want to use the "cosine similarity" as distance metric:

distance = spatial.distance.pdist(data,'cosine')
linkage = fastcluster.linkage(distance,method="complete")

The output is:

Traceback (most recent call last):
  File "C:\djcode\mysite\mysite\scipytest.py", line 52, in <module>
    linkage = fastcluster.linkage(distance,method="complete")
  File "C:\Python33\lib\site-packages\fastcluster.py", line 245, in linkage
    linkage_wrap(N, X, Z, mthidx[method])
FloatingPointError: NaN dissimilarity value.
user1680859
  • 1,160
  • 2
  • 24
  • 40
  • 8
    I'm guessing here, but it seems that (at least) one of the vectors you want to cluster is all zeros, so when it tries to compute the cosine distances to it there is a division by zero, hence `nan` is stored in your `distance` array, and that leads to your error. If this is what's happening, there are ways to work around your error, but you first need to figure out what you want to do with those zero vectors. – Jaime Sep 29 '13 at 16:43
  • Thank you @Jaime, that was the problem. The vectors that I tested were automatically constructed (from text), so I should work around this error. – user1680859 Sep 29 '13 at 18:39

0 Answers0