DBSCAN with python and scikit-learn: What exactly are the integer labes returned by make_blobs?

Question

I'm trying to comprehend the example for the DBSCAN algorithm implemented by scikit (http://scikit-learn.org/0.13/auto_examples/cluster/plot_dbscan.html).

I changed the line

X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4)

with X = my_own_data, so I can use my own data for the DBSCAN.

now, the variable labels_true, which is the second returned argument of make_blobs is used to calculate some values of the results, like this:

print "Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels)
print "Completeness: %0.3f" % metrics.completeness_score(labels_true, labels)
print "V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels)
print "Adjusted Rand Index: %0.3f" % \
    metrics.adjusted_rand_score(labels_true, labels)
print "Adjusted Mutual Information: %0.3f" % \
    metrics.adjusted_mutual_info_score(labels_true, labels)
print ("Silhouette Coefficient: %0.3f" %
       metrics.silhouette_score(D, labels, metric='precomputed'))

how can I calculate labels_true from my data X? what exactly do scikit mean with label on this case?

thanks for your help!

score 12 · Accepted Answer · answered Apr 04 '13 at 18:45

12

labels_true is the "true" assignment of points to labels: which cluster they should actually belong on. This is available because make_blobs knows which "blob" it generated the point from.

You can't get that for your own arbitrary data X, unless you have some kind of true labels for the points (in which case you wouldn't be doing clustering anyway). This just shows some measures of how well the clustering performed in a fake case where you know the true answer.

answered Apr 04 '13 at 18:45

Danica

28,423
6
90
122

1

So comment out these lines and the example runs. #print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels)) #print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels)) #print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels)) #print("Adjusted Rand Index: %0.3f" # % metrics.adjusted_rand_score(labels_true, labels)) #print("Adjusted Mutual Information: %0.3f" # % metrics.adjusted_mutual_info_score(labels_true, labels)) – intotecho Apr 15 '16 at 12:10

DBSCAN with python and scikit-learn: What exactly are the integer labes returned by make_blobs?

1 Answers1