I built a GMM model and used this to run a prediction.
bead = df['Ce140Di']
dna = df['DNA_1']
X = np.column_stack((dna, bead)) # create a 2D array from the two lists
#plt.scatter(X[:,0], X[:,1], s=0.5, c='black')
#plt.show()
gmm = GaussianMixture(n_components=4, covariance_type='tied')
gmm.fit(X)
labels = gmm.predict(X)
and then generated a plot as follows...
df['predicted_cluster'] = labels
fig= plt.figure()
colors = {1:'red', 2:'orange', 3:'purple', 0:'grey'}
plt.scatter(df['DNA_1'], df['Ce140Di'], c=df['predicted_cluster'].apply(lambda x: colors[x]), s = 0.5, alpha=0.5)
plt.show()
scatter plot colored by predictions
Whilst I have the output prediction for each row of my df, I don't actually know what cluster it corresponds to without looking at my colors
dictionary, is there a way to do this without having to look at the scatter plot each time?
In other words, I want to know that 0 will always correspond to my grey cluster or that 1 will always be the red cluster but this changes each time...
Colors aside, how do I know the position of each cluster? What does a label of 0 mean?
EDIT I believe the answer to my perhaps silly question is to use np.random.seed
but I could be wrong...