I've implemented mean-shift clustering several times and have run into this same issue. Depending on how many iterations you're willing to shift each point for, or what your termination criteria is, there is usually some post-processing step where you have to group the shifted points into clusters. Points that theoretically shift to the same mode need not practically end up on directly top of each other.
I think the best and most general way to do this is to use a threshold based on the kernel bandwidth, as suggested in the comments. In the past my code to do this post processing has usually looked something like this:
threshold = 0.5 * kernel_bandwidth
clusters = []
for p in shifted_points:
cluster = findExistingClusterWithinThresholdOfPoint(p, clusters, threshold)
if cluster == null:
// create new cluster with p as its first point
newCluster = [p]
clusters.add(newCluster)
else:
// add p to cluster
cluster.add(p)
For the findExistingClusterWithinThresholdOfPoint
function I usually use the minimum distance of p
to each currently defined cluster.
This seems to work pretty well. Hope this helps.