Processing an image-matrix with K-Means in SKLearn

Question

I want to classify my images with SKLearn. Using python, I read an image from the disk and "remove" the background via adaptive thresholding. Also, I calculate the mean color from the object (without background) and the standard deviation between all pixels and the mean color. The thresholded image, the mean color and stdv are the result of my feature extraction and I now want to run a cluster algorithm over all my images with their features to classify similar images.

However, the KMeans function seems to only accept one dimensional arrays with a single value per feature for each object to be clustered, instead of my data, where a single object has

a matrix, displaying the image as such: (((r,g,b)(r,g,b,)(r,g,b,)), ((r,g,b,)(r,g,b,)(r,g,b,)))
a vector with the mean color: (r,g,b)
a vector with the stdv for each color aspect: (r,g,b)

If I only had the two vectors, I would seperate them in to 6 features, such as (r, g, b, r2, g2, b2). However, the matrix is 100x100 and this would get me 10.000 additional features, which cannot be the answer.

I would be happy to hear about solutions to my problem or guides to other ways of classifying the images with the features I extracted. Thanks in advance!

score 0 · Answer 1 · answered Nov 28 '17 at 20:51

Yes, you need to reshape your data. And yes, a 100x100x3 image yields 30000 features.

KMeans tends to not work on such data well, because of the curse of dimensionality.

You need to extract features for a suitable lower dimensional representation. A few years ago, the answer to that was "bag of visual words". Now it is "deep learning", use one of the last layers as your features, not the classification output layer.

Processing an image-matrix with K-Means in SKLearn

1 Answers1