-1

Hi Guys I have fitted a DBSCAN model on a set of points (4953 points). Now I need to find the points which belong to different clusters i.e which all input values belong to which all clusters.I have total of 10 clusters.How can I find this out?

db = DBSCAN(eps=0.0003,min_samples=20,n_jobs=-1).fit(X_scaled)
y_pred = db.fit_predict(X_scaled)
pred_labels = db.labels_
print(len(pred_labels))
n_clusters_ = len(set(pred_labels))- (1 if -1 in pred_labels else 0)
print(n_clusters_)
plt.scatter(list(range(len(df_median2))),X_scaled[:,0],c=y_pred, cmap='Paired')
plt.ylim(0.1,0.4)

The above is the code.

Fasty
  • 784
  • 1
  • 11
  • 34

1 Answers1

1

You have your X_scaled array of input values (and possibly, the X array of original values before scaling) and the pred_labels array of cluster labels. The value of n_clusters_ will be one higher than the value of len(set(pred_labels)) if some samples were not assigned to a cluster and were categorized as noise. But the two arays have the same number of elements in the same order, so you can look up values in pred_labels array for each element in the X_scaled array, e.g., pred_labels[0] will return the cluster label for the first sample. If you see -1 among any labels, those are not cluster labels, but just a way of denoting noise samples not assigned to any cluster.

You can also concatenate the two arrays so the cluster labels are saved side-by-side with the original samples:

import numpy as np
samples_w_lbls = np.concatenate((X_scaled,pred_labels[:,np.newaxis]),axis=1)

Then, you can filter that combined array to rows having a particular cluster label value:

#Get rows with cluster label value of 5:
filter = np.asarray([5])
samples_w_lbls[np.in1d(samples_w_lbls[:,-1], filter)]
AlexK
  • 2,855
  • 9
  • 16
  • 27