Not Getting soft clustering using Gaussian Mixture Models

Asked Aug 15 '19 at 00:35

Active Aug 18 '19 at 21:52

Viewed 149 times

I want to perform soft clustering on text data and so I am using the Gaussian Mixture Model so that every text can belong to multiple clusters.

I converted the text into columns using tfidf and then performed LSA for dimensionality reduction.

I performed GMM on the LSA output.

However, I am not getting soft clustering using Gaussian Mixture Models when I used the predict_proba(X) function, each document is being assigned to only one cluster, ideally should give probabilities for multiple clusters.

from sklearn.mixture import GaussianMixture

gmm = GaussianMixture(n_components=22,covariance_type='full',n_init=10,random_state=42).fit(X)
labels = gmm.predict(X)
Gaussian=pd.DataFrame(gmm.predict_proba(X))

I expect the output to be a split of probabilities across multiple clusters. Can you pls help me in understanding the reason

edited Aug 18 '19 at 21:52

kangaroo_cliff

6,067
3
29
42

asked Aug 15 '19 at 00:35

sc_analyst

Try *not* using a `full` model. – Has QUIT--Anony-Mousse Aug 18 '19 at 07:40
@sc_analyst Try to provide a reproducible example. It does seem the output should contain a matrix with 22 columns with posterior probabilities for the 22 clusters. – kangaroo_cliff Aug 19 '19 at 05:26
what is the shape of the `Gaussian` dataframe? – Parthasarathy Subburaj Aug 19 '19 at 15:53

Not Getting soft clustering using Gaussian Mixture Models

0 Answers0