Repeatability in results of clustering very big image

Question

I am working on clustering very big satellite images. I use PIL to open the image, K-means and Gaussian Mixture Models methods for clustering and then calculate the % of the part that has the certain color. I've noticed that I don't always get the same results, which is expected with clustering methods, but it surprises me that sometimes I get identical results. Is it normal? Shouldn't the results be more varied?

I tried to run the script several times and plotted the results. It seems like there is few values that the results oscillate around. Results are listed and plotted below

Results plot

np.unique(results, return_counts=True)\
(array([17.1136831 , 18.91601425, 19.68031917, 19.69205571, 20.4079544, 21.11455915, 21.12483615, 21.81013305, 22.62486448, 22.65183496,22.66496585, 23.36093498]),
array([1, 3, 2, 5, 1, 8, 2, 7, 1, 1, 3, 1], dtype=int64))

It's not unreasonable that you sometimes get identical answers for coverage%. This is based on the number of pixels in each cluster, and there are only a limited number of possibilities. The number of unique values in your plot is at maximum the range of cluster sizes (i.e. pixel count). Calculating the coverage% provides an illusion of more precision than you have. — Michael Sohnen, Feb 10 '23 at 22:50

Repeatability in results of clustering very big image

0 Answers0