6

As per as the sklearn kmeans documentation, it says that k-means requires a matrix of shape=(n_samples, n_features). But I provided a distance matrix of shape=(n_samples,n_samples) where each index holds the distance between two strings. The time series has been converted into strings using the SAX representation.

When I ran the clustering with the distance matrix, it gives good result. What can be the possible reason for this? As far as I know, K-medoids is the one which works with distance matrix.

Shivam Mitra
  • 1,040
  • 3
  • 17
  • 33

1 Answers1

6

K-means, as the name indicates, uses means.

Computing the arithmetic mean requires access to the original features, a distance matrix cannot be used.

K-means also does not use pairwise distances. So the distance matrix is useless for this algorithm.

Choose a different algorithm instead, such as hierarchical clustering.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194