I'm trying to comprehend the example for the DBSCAN algorithm implemented by scikit (http://scikit-learn.org/0.13/auto_examples/cluster/plot_dbscan.html).
I changed the line
X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4)
with X = my_own_data
, so I can use my own data for the DBSCAN.
now, the variable labels_true
, which is the second returned argument of make_blobs
is used to calculate some values of the results, like this:
print "Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels)
print "Completeness: %0.3f" % metrics.completeness_score(labels_true, labels)
print "V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels)
print "Adjusted Rand Index: %0.3f" % \
metrics.adjusted_rand_score(labels_true, labels)
print "Adjusted Mutual Information: %0.3f" % \
metrics.adjusted_mutual_info_score(labels_true, labels)
print ("Silhouette Coefficient: %0.3f" %
metrics.silhouette_score(D, labels, metric='precomputed'))
how can I calculate labels_true
from my data X
? what exactly do scikit mean with label
on this case?
thanks for your help!