Can k-means clustering be used to define classifications in recognition?

Question

I'm doing a recognition problem (faces) and trying to reduce the problem size. I originally began with training data in a feature-wise coordinate system in 120 dimensions, but through PCA I found a better PC-wise coordinate system needing only 20 dimensions while still conveying 95% of the data.

I began thinking that recognition by definition is a problem of classification. Points in n-space belonging to the same object/face/whatever would cluster. To take an example, if 5 instances of the same individual are in the training data, they would cluster and the mid-point of that cluster could be numerically defined using k-means.

I have 100,000 observations, each person is represented by 5-10 headshots, this means instead of comparing a novel input to 100,000 points in my 20-space, I could instead compare to 10,000-20,000 centroids. Can k-means be used like this or have I misinterpreted? k is obviously undefined but I've been reading up on ways to find optimal k.

My specific recognition problem doesn't use neural nets but rather simple arithmetic euclidean distances between points.

There might be some merit to do that (in euclidean space), but there are two huge downsides: A: kmeans is a heuristic; it does not guarantee a global-optimum (problem is NP-hard) B: kmeans has some a-priori parameter k. It does not allow to chose k by itself. You did not specify much requirements for your task, but did you try some metric-tree, e.g. kd-tree / ball-tree? — sascha, Dec 11 '18 at 19:56

Can k-means clustering be used to define classifications in recognition?

0 Answers0