-1

I'm using meanshift clustering to remove unwanted noise from my input data.. Data can be found here. Here what I have tried so far..

import numpy as np
from sklearn.cluster import MeanShift
data = np.loadtxt('model.txt', unpack = True) 
## data size is [3X500]
ms = MeanShift()
ms.fit(data)

after trying some different bandwidth value I am getting only 1 cluster.. but the outliers and noise like in the picture suppose to be in different cluster.

enter image description here

when decreasing the bandwidth a little more then I ended up with this ... which is again not what I was looking for.

enter image description here

Can anyone help me with this?

jquery404
  • 653
  • 1
  • 12
  • 26
  • 2
    Note: I've retagged your post to Python because you're using `numpy` and `sklearn`, not MATLAB. In any case, to me those "outliers" are rather subjective. Why do you believe those points are outliers? What qualitative / quantitative observations have you made to determine those are outliers? If you can't answer this, then getting a machine learning / clustering algorithm to remove what you can't describe in detail is going to be rather difficult. It would also help if you provided the original input data so we can reconstruct your problem. – rayryeng Jun 26 '15 at 15:22
  • @rayryeng hi I have included the input data.. i have a model from which i generated these points. for instance in this case its a bunny ... so points that are far away from actual surface or points in less dense area are considering as outlier . as you can tell point in circle is more or less isolated comparing to others – jquery404 Jun 26 '15 at 15:47

3 Answers3

1

You can remove outliers before using mean shift.

Statistical removal

For example, fix a number of neighbors to analyze for each point (e.g. 50), and the standard deviation multiplier (e.g. 1). All points who have a distance larger than 1 standard deviation of the mean distance to the query point will be marked as outliers and removed. This technique is used in libpcl, in the class pcl::StatisticalOutlierRemoval, and a tutorial can be found here.

enter image description here

Deterministic removal (radius based)

A simpler technique consists in specifying a radius R and a minimum number of neighbors N. All points who have less than N neighbours withing a radius of R will be marked as outliers and removed. Also this technique is used in libpcl, in the class pcl::RadiusOutlierRemoval, and a tutorial can be found here.

enter image description here

fferri
  • 18,285
  • 5
  • 46
  • 95
0

Mean-shift is not meant to remove low-density areas.

It tries to move all data to the most dense areas.

If there is one single most dense point, then everything should move there, and you get only one cluster.

Try a different method. Maybe remove the outliers first.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • any suggestion to how remove the outlier from this data ? – jquery404 Jun 27 '15 at 03:51
  • They don't look like outliers to me, so I'm not convinced something will work. But try the usual methods kNN, LOF, LoOP, ... also try density estimation techniqued such as kernel density estimates, maybe you can find a density threshold. – Has QUIT--Anony-Mousse Jun 27 '15 at 06:08
  • as far as I know mean shift itself need kernel density estimation to estimate density then it shift the mean to high density region .. so I thought and as you suggested to use KDE, so why its not working? – jquery404 Jun 27 '15 at 14:49
  • 1
    MeanShift is not looking for low-density points, but for high density. – Has QUIT--Anony-Mousse Jun 28 '15 at 04:53
  • thank you so much for your help. could you plz direct me to any kernel density estimation implementation for 3d data in python/ any other language ? – jquery404 Jun 28 '15 at 06:36
  • Try scipy. it should have KDE. Or implement it yourself, textbook statistics, nothing special. – Has QUIT--Anony-Mousse Jun 28 '15 at 06:38
-1

set his parameter to false cluster_allbool, default=True If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.

user2987264
  • 7
  • 1
  • 1
  • 5