2

Pretty new to clustering and trying out HDBSCAN clustering but I'm having a hard time figuring out how to get the cluster centers. With KMeans it is set with the cluster.

How do I go about getting the cluster centers?

Here's my code:

#!/usr/bin/env python3

from sklearn.cluster import KMeans
from sklearn import metrics
import cv2
import numpy as np
import hdbscan
from pprint import pprint


# Read image into opencv
image = cv2.imread('4.jpg')

# Set color space
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# reshape the image to be a list of pixels
pixels = image.reshape((image.shape[0] * image.shape[1], 3))

# Build the clusterer
cluster = hdbscan.RobustSingleLinkage(cut=0.125, k=7)
cluster.fit(pixels)


>>> pprint(vars(cluster))
{'_cluster_hierarchy_': <hdbscan.plots.SingleLinkageTree object at 0x110deda58>,
 '_metric_kwargs': {},
 'algorithm': 'best',
 'alpha': 1.4142135623730951,
 'core_dist_n_jobs': 4,
 'cut': 0.125,
 'gamma': 5,
 'k': 7,
 'labels_': array([  0,   0,   0, ..., 360, 220, 172]),
 'metric': 'euclidean'}

Versus this is what KMeans output gives you:

{'cluster_centers': (array([ 64.93473757,  65.65262431,  72.00103591]),
                     array([  77.55381605,   85.80626223,  102.29549902]),
                     array([ 105.66884532,  115.81917211,  131.55555556]),
                     array([ 189.20149254,  197.00497512,  205.43034826]),
                     array([ 148.0922619 ,  156.5       ,  168.33333333])),
 'cluster_centers_': array([[ 105.66884532,  115.81917211,  131.55555556],
       [  64.93473757,   65.65262431,   72.00103591],
       [ 148.0922619 ,  156.5       ,  168.33333333],
       [ 189.20149254,  197.00497512,  205.43034826],
       [  77.55381605,   85.80626223,  102.29549902]]),
 'copy_x': True,
 'inertia_': 1023155.888923295,
 'init': 'k-means++',
 'labels_': array([1, 1, 1, ..., 1, 1, 1], dtype=int32),
 'max_iter': 300,
 'n_clusters': 5,
 'n_init': 10,
 'n_iter_': 8,
 'n_jobs': 1,
 'precompute_distances': 'auto',
 'random_state': None,
 'tol': 0.0001,
 'verbose': 0}
stwhite
  • 3,156
  • 4
  • 37
  • 70

1 Answers1

4

Clusters in (H)DBSCAN do not have centers.

The clusters may be non-convex, and if you compute the average of all points (and your data are points - they don't need to be) it may then be outside of the cluster.

Also note that DBSCAN also gives noise points,that don't have a center at all.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thanks for the answer and for marking this as a duplicate. That other question was worded differently so it wasn't coming up in my searches. I'll need to rephrase my question and code to narrow down the problem + original question. – stwhite May 11 '17 at 18:52