Clusters merge threshold

Question

I'm working with Mean shift, this procedure calculates where every point in the data set converges. I can also calculate the euclidean distance between the coordinates where 2 distinct points converged but I have to give a threshold, to say, if (distance < threshold) then this points belong to the same cluster and I can merge them.

How can I find the correct value to use as threshold??
(I can use every value and from it depends the result, but I need the optimal value)

please choose your tags carefully in the future: placing wrong tags will most likely cause your question to be invisible to people who may know how to answer it. — Shai, Jan 24 '13 at 19:20
Have you thought of using pill-box kernel: it has a better convergence properties. — Shai, Jan 24 '13 at 19:21
pill-box kernel? i don't know what it is...but my problem doesn't concern convergence properties,i have only to set the appropriate threshold to merge points into cluster,only i don't know how to chose the best value!( is a sort of choice of k in k-means) — Federico Catalano, Jan 24 '13 at 19:25
in mean-shift clustering each cluster is represented as a distinct "basin of attraction" in the induced density. If close-by data points converge to **different** modes of the density function, then your kernel is not smooth enough: you have too many local modes. You need a kernel that is more localize and more smooth. One such kernel is a finite support uniform kernel (aka pillbox kernel). — Shai, Jan 24 '13 at 19:29
i have 3d point,so,if one point,e.g., converges to (11.345,23.896, 87.52) and another point to (11.789,23.24,87.25), they doesn't belong to the same cluster and the problem is that my kernel is not enough smooth?(then the point has to be exactly the same,right?)good to know...where i can find some examples about this pillbox kernel? — Federico Catalano, Jan 24 '13 at 19:51
pill-box kernel probably is aka the "box" kernel. Indeed, it should give you the exact same convergence point when a gaussian kernel will still vary (because it does not weight neighbors). But have you tried using a threshold such as 0.1 * kernel width? That should usually just work. — Has QUIT--Anony-Mousse, Jan 25 '13 at 07:50
@Shai his question is: with a smooth kernel such as Gaussian, *when are two modes the same*? — Has QUIT--Anony-Mousse, Jan 25 '13 at 07:51
@Anony-Mousse and my answer is - by definition two clusters are the same when they converge to the same mode. If your density estimation is too "un-smooth" (I don't know the word for it) then you have a problem in your settings: either smooth more (wider Gaussian kernel), or use a different kernel that is more localize. Either way, you have to know what you are doing, just throwing thresholds over principled clustering algorithm will do you no good if you do not understand what is going on. — Shai, Jan 25 '13 at 09:15
I don't think further increasing the kernel bandwidth or using a "stupid kernel" is the best idea. Instead, one has to realize that some modes may need to be considered the same. Anyway, do you actually know a formal defintion of mean-shift for actual clustering? I've only seen very fuzzy descriptions of the principle, but nothing you could validate an implementation against. — Has QUIT--Anony-Mousse, Jan 25 '13 at 09:31
For practical use, you *do* want to have mean-shift where a cluster *may* consists of several modes, if they are close to each other. — Has QUIT--Anony-Mousse, Jan 25 '13 at 09:32
1)first of all i've found an error on my code, points stop at least one step before convergence and i've fixed it! 2)the only way to simulate the matematical convergence is to calculate the distance from the point to the mean of the window and, if this distance is very very small (i've set 1e-3) this means that the point has reached the convergence,right? — Federico Catalano, Jan 25 '13 at 18:23
3)since we use a value to verify convergence,the points will not lead all to the same ( i mean exactly the same) value, a little difference ( in my example in order of 1e-3 ) could be found. 4)then i've set the second threshold to the same value as the first,so the clustering depends only by the bandwidth i'm doing right or i'm still in the dark? — Federico Catalano, Jan 25 '13 at 18:24

score 0 · Answer 1 · answered Jun 24 '14 at 02:39

I've implemented mean-shift clustering several times and have run into this same issue. Depending on how many iterations you're willing to shift each point for, or what your termination criteria is, there is usually some post-processing step where you have to group the shifted points into clusters. Points that theoretically shift to the same mode need not practically end up on directly top of each other.

I think the best and most general way to do this is to use a threshold based on the kernel bandwidth, as suggested in the comments. In the past my code to do this post processing has usually looked something like this:

threshold = 0.5 * kernel_bandwidth
clusters = []
for p in shifted_points:
    cluster = findExistingClusterWithinThresholdOfPoint(p, clusters, threshold)
    if cluster == null:
        // create new cluster with p as its first point
        newCluster = [p]
        clusters.add(newCluster)
    else:
        // add p to cluster
        cluster.add(p)

For the findExistingClusterWithinThresholdOfPoint function I usually use the minimum distance of p to each currently defined cluster.

This seems to work pretty well. Hope this helps.

Clusters merge threshold

1 Answers1