4

I'm currently working on an image dataset (250 000 images, so just as much as features vectors, everyone of them composed of 132 features) and trying to use the KMeans function provided by sklearn.

I run it on Mac OS X 10.10, Python 2.7 and sklearn 0.15.2, and after a while I only obtain a:

Killed: 9

Error when running these command lines:

nb_cls = int(raw_input("Number of clusters chosen :"))
clusterer = sklearn.cluster.KMeans(n_clusters=nb_cls)
clusters_labels = clusterer.fit_predict(X)
silhouette = sklearn.metrics.silhouette_score(X, clusters_labels)
print "n clusters =", nb_cls, "/ silhouette_score =", silhouette

Please note that whitout the calculation of the silhouette score, the code isn't killed

For smaller datasets (± 2 500 images) the same algorithm is efficient and there is no such Python error.

How could I avoid this Killed 9 error? Is this calculation too ambitious for my laptop?

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Julian
  • 556
  • 1
  • 8
  • 27

1 Answers1

1

It means your script was killed by the OS. In most cases it's because it was using too much memory. It seems likely in your case as your code works fine when you use only 2 500 images.

If it is a memory problem, you will have to either get more RAM (not possible on a mac ?), use another computer with more RAM or reduce the size of the dataset.

TheWalkingCube
  • 2,036
  • 21
  • 26
  • Ok. I'm working with 16Gb RAM but it may be not enough, I agree. But is there a way to prevent the OS from killing the script, even if it will take more time ? – Julian May 19 '15 at 09:42
  • I don't know. But if silhouette_score requires all data to be loaded in memory, that will probably not be possible anyway. I guess you should first make sure if it is a memory problem by monitoring an execution. – TheWalkingCube May 19 '15 at 10:57