0

The K-means method cannot deal with anistropic points. The DBSCAN and Gaussian Mixture model seems that they can work with this according to scikit-learn. I have tried to use both approaches, but they are not working for my dataset.

DBSCAN

I used the following code:

db = DBSCAN(eps=0.1,min_samples=5 ).fit(X_train,Y_train)
labels_train=db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels_train)) - (1 if -1 in labels_train else 0)
print('Estimated number of clusters: %d' % n_clusters_)

and only 1 cluster (Estimated number of clusters: 1) was detected as shown here.

Gaussian Mixture model

The code was as follows:

gmm = mixture.GaussianMixture(n_components=2, covariance_type='full')
gmm.fit(X_train,Y_train)
labels_train=gmm.predict(X_train)
print(gmm.bic(X_train))

The two clusters could not be distinguished as shown here.

How can i detect two clusters?

trumee
  • 393
  • 1
  • 4
  • 11

1 Answers1

0

Read the documentation.

fit(X, y=None, sample_weight=None)

X : array or sparse (CSR) matrix of shape (n_samples, n_features) [...]

...

y : Ignored

So your invocation ignores the y coordinate.

Don't we all love python/sklearn, because it doesn't even warn you of this, but silently ignores y?

X should be the entire data, not just the x coordinates.

The notion of "train" and "predict" does not make sense for clustering. Don't use it. Only use fit_predict.

Community
  • 1
  • 1
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194