I want to perform soft clustering on text data and so I am using the Gaussian Mixture Model so that every text can belong to multiple clusters.
I converted the text into columns using tfidf and then performed LSA for dimensionality reduction.
I performed GMM on the LSA output.
However, I am not getting soft clustering using Gaussian Mixture Models when I used the predict_proba(X) function, each document is being assigned to only one cluster, ideally should give probabilities for multiple clusters.
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=22,covariance_type='full',n_init=10,random_state=42).fit(X)
labels = gmm.predict(X)
Gaussian=pd.DataFrame(gmm.predict_proba(X))
I expect the output to be a split of probabilities across multiple clusters. Can you pls help me in understanding the reason