0

Can someone explain to me why the K-means algorithm is so used (especially in documents clustering) despite its defects, instead of K-medoids for example, or CAH, SOM etc.?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Themis
  • 139
  • 1
  • 1
  • 8
  • Says who? Just because k-means is a popular textbook intro does not mean it is actually used much (in particular not *successfully*). And why do you think k-medoids would be a better choice? Document collections tend to be huge. – Has QUIT--Anony-Mousse Oct 02 '18 at 23:22
  • Thanks Anony-Mousse. This is a quite new subject for me. I have seen many documents clustering applications in scientific articles with Kmeans, hence my bias. Regarding the K-medoid, I read that it was created to be more robust to noise and outliers than Kmeans. My dataset is not really huge, so K-medoid could be better. – Themis Nov 12 '18 at 13:57
  • Or it could be worse. Or both could be really bad, just that the average author doesn't know anything better. Or doesn't care, as long as there is *some* output... – Has QUIT--Anony-Mousse Nov 12 '18 at 14:33
  • You're right ! You look expert in data mining and clustering, Which algorithms do you use the most? – Themis Nov 12 '18 at 16:42
  • Whatever fits the problem best. I don't want to name "favorites" because I always tell people to make informed choices, not guesses based on general recommendations. – Has QUIT--Anony-Mousse Nov 12 '18 at 17:36

0 Answers0