Are there advantages of using sklearn KMeans versus SciPy kmeans?

Question

From the documentation of sklearn KMeans

class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1)

and SciPy kmeans

scipy.cluster.vq.kmeans(obs, k_or_guess, iter=20, thresh=1e-05, check_finite=True)

it is clear the number of parameters differ and perhaps more of them are available for sklearn.

Have any of you tried one versus the other and would you have a preference for using one of them in a classification problem?

Without trying it, i would always prefer sklearn. Better documentation (including user-guides) and much more tools you would likely use too, like Cross-Validation/Gridsearch. But that's just my opinion. — sascha, May 13 '16 at 14:29
The scipy implementation gives you the option to set your own centroids, which can be nice. Also note that for most applications, you'll be wanting to use [kmeans2](http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.vq.kmeans2.html), not the one you quote. Besides that, I can't say. — patrick, May 13 '16 at 16:06

score 5 · Answer 1 · answered May 13 '16 at 20:38

5

Benchmark.

And you will never touch the scipy one again.

answered May 13 '16 at 20:38

Has QUIT--Anony-Mousse

76,138
12
138
194

it seems difficult to compare one to another -- the params for SciPy do not match perfectly those for sklearn: for example, the default number of initializations for sklearn is n=10, while in SciPy it isn't explicit. Using 100 centroids for both and other params as default, SciPy is faster but that doesn't mean better. – pepe May 15 '16 at 01:53
Disable all the extras. `n_init=1`, `tol=thresh=0`, `max_iter=iter=100000` (you want the final result, not an interim result). Use a *large* data set. – Has QUIT--Anony-Mousse May 15 '16 at 07:22
Scipy has less overhead. A major advantage when running on small datasets. – Michael Mezher Jan 23 '23 at 19:52

Are there advantages of using sklearn KMeans versus SciPy kmeans?

1 Answers1