I am trying to do hierarchical clustering in Python over a collection of documents. I used scipy.cluster.hierarchy with method=average and metric=cosine as bellow:
distMatrix = pairwise_distances(X_normalized, metric='cosine')
L = fastcluster.linkage(distMatrix, method='average')
I have problem interpreting the output of the linkage method, since some distances are more than one. How is that possible when the metric I am using is cosine? Isn't it supposed to be less than or equal to 1?
[[ 7. 22. 0. 2. ] [ 14.
27. 0. 2. ] [ 33. 34. 0.266383 2. ] [ 2. 12. 0.77866776 2. ] [ 18. 20. 1.09118911 2. ] [ 0.
6. 1.09586741 2. ] [ 26. 30. 1.09711324 2. ] [ 32. 42. 1.12491309 3. ] [ 15. 16. 1.12715133 2. ] [ 5.
21. 1.18961564 2. ] [ 4. 8. 1.21144117 2. ] [ 3. 24. 1.21711052 2. ] [ 9. 17. 1.26018569 2. ] [ 1.
23. 1.27712536 2. ] [ 35. 41. 1.34423149 3. ] [ 13. 45. 1.36113739 3. ] [ 28. 46. 1.38535987 3. ] [ 29.
40. 1.40081718 3. ] [ 31. 44. 1.42614738 3. ] [ 25. 51. 1.42704815 4. ] [ 11. 50. 1.43200913 4. ] [ 10.
53. 1.44240297 4. ] [ 47. 54. 1.4833146 5. ] [ 19. 55. 1.48739052 5. ] [ 48. 52. 1.49125894 5. ] [ 49.
59. 1.50473572 7. ] [ 58. 60. 1.55300865 10. ] [ 57. 62. 1.56317408 14. ] [ 56. 61. 1.5656443 11. ] [ 63.
64. 1.58042986 25. ]