I used GMM from Scikit Learn package for clustering. The python code is here.
import pandas as pd
from numpy import unique
from numpy import where
from sklearn.mixture import GaussianMixture
from matplotlib import pyplot
#load data
rawData=pd.read_excel('ClusteringFailure.xlsx',0)
X=rawData.iloc[:, :].to_numpy(dtype='float64')
#define model and set number of clusters to 4 for genotyping
model = GaussianMixture(n_components=4)
#fit the model
model.fit(X)
#assign a cluster index to each data point
yCluster = model.predict(X)
clusters = unique(yCluster)
for cluster in clusters:
row_ix = where(yCluster == cluster)
pyplot.scatter(X[row_ix, 0], X[row_ix, 1])
Here is the data I used.
x y
18.586 46.33
0.109 68.534
0.074 5.242
22.212 63.888
3.726 36.767
0.159 6.98
24.531 9.925
0.143 0.299
29.91 54.539
29.868 12.522
0.064 2.6
29.978 48.665
I ran it multiple times and every time the clustering was different. Can anyone explain why it is not consistent and advise on how to improve the consistency? Thanks!