1

I have a small set of aerial images where different terrains visible in the image have been have been labelled by human experts. For example, an image may contain vegetation, river, rocky mountains, farmland etc. Each image may have one or more of these labelled regions. Using this small labeled dataset, I would like to fit a gaussian mixture model for each of the known terrain types. After this is complete, I would have N number of GMMs for each N types of terrains that I might encounter in an image.

Now, given a new image, I would like to determine for each pixel, which terrain it belongs to by assigning the pixel to the most probable GMM. Is this the correct line of thought ? And if yes, how can I go about clustering an image using GMMs

HuckleberryFinn
  • 1,489
  • 2
  • 16
  • 26

2 Answers2

-1

Intuitively, your thought process is correct. If you already have the labels that makes this a lot easier.

For example, let's pick on a very well known and non-parametric algorithm like Known Nearest Neighbors https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

In this algorithm, you would take your new "pixels" which would then find the closest k-pixels like the one you are currently evaluating; where closest is determined by some distance function (usually Euclidean). From there, you would then assign this new pixel to the most frequently occurring classification label.

I am not sure if you are looking for a specific algorithm recommendation, but KNN would be a very good algorithm to begin testing this type of exercise out on. I saw you tagged sklearn, scikit learn has a very good KNN implementation I suggest you read up on.

artemis
  • 6,857
  • 11
  • 46
  • 99
-1

Its not clustering if you use labeled training data!

You can, however, use the labeling function of GMM clustering easily.

For this, compute the prior probabilities, mean and covariance matrixes, invert them. Then classify each pixel of the new image by the maximum probability density (weighted by prior probabilities) using the multivariate Gaussians from the training data.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • You can create clusters from labeled data. In fact, that is often times the point of clustering; to create labels – artemis May 31 '19 at 15:43
  • Clustering can be used to create labels, yes. But if you *already* have expert labels, what is the point in finding inferior labels? "Given a new image ..." - HuckleberryFinn clearly asks for *classification* of the pixels in a new image according to the training labels. – Has QUIT--Anony-Mousse May 31 '19 at 19:53
  • There is something called the "label-assignment" problem. When you run any unsupervised clustering algorithm to perform pixel-wise clustering, you end up getting N number of clusters each of these clusters are randomly assigned some label from 0 to N-1. The GT can be used to assign the true labels to the clustered regions by examining some statistics of a given cluster against the statistics of known clusters. – HuckleberryFinn Nov 12 '19 at 15:52
  • Most likely there does *not* exist a *good* 1-on-1 mapping. If you have labels, you'd better use them more effectively than for finding the least bad such mapping... – Has QUIT--Anony-Mousse Nov 13 '19 at 02:08