0

I want to perform soft clustering on text data and so I am using the Gaussian Mixture Model so that every text can belong to multiple clusters.

I converted the text into columns using tfidf and then performed LSA for dimensionality reduction.

I performed GMM on the LSA output.

However, I am not getting soft clustering using Gaussian Mixture Models when I used the predict_proba(X) function, each document is being assigned to only one cluster, ideally should give probabilities for multiple clusters.

from sklearn.mixture import GaussianMixture

gmm = GaussianMixture(n_components=22,covariance_type='full',n_init=10,random_state=42).fit(X)
labels = gmm.predict(X)
Gaussian=pd.DataFrame(gmm.predict_proba(X))

I expect the output to be a split of probabilities across multiple clusters. Can you pls help me in understanding the reason

kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42
sc_analyst
  • 21
  • 1

0 Answers0