Why K-means is so used for document clustering?

Asked Oct 02 '18 at 09:55

Active Oct 02 '18 at 23:20

Viewed 99 times

Can someone explain to me why the K-means algorithm is so used (especially in documents clustering) despite its defects, instead of K-medoids for example, or CAH, SOM etc.?

edited Oct 02 '18 at 23:20

Has QUIT--Anony-Mousse

76,138
12
138
194

asked Oct 02 '18 at 09:55

Themis

Says who? Just because k-means is a popular textbook intro does not mean it is actually used much (in particular not *successfully*). And why do you think k-medoids would be a better choice? Document collections tend to be huge. – Has QUIT--Anony-Mousse Oct 02 '18 at 23:22
Thanks Anony-Mousse. This is a quite new subject for me. I have seen many documents clustering applications in scientific articles with Kmeans, hence my bias. Regarding the K-medoid, I read that it was created to be more robust to noise and outliers than Kmeans. My dataset is not really huge, so K-medoid could be better. – Themis Nov 12 '18 at 13:57
Or it could be worse. Or both could be really bad, just that the average author doesn't know anything better. Or doesn't care, as long as there is *some* output... – Has QUIT--Anony-Mousse Nov 12 '18 at 14:33
You're right ! You look expert in data mining and clustering, Which algorithms do you use the most? – Themis Nov 12 '18 at 16:42
Whatever fits the problem best. I don't want to name "favorites" because I always tell people to make informed choices, not guesses based on general recommendations. – Has QUIT--Anony-Mousse Nov 12 '18 at 17:36

Why K-means is so used for document clustering?

0 Answers0